Sorcerer's IsleDocs cfRegexOverviewCode

Regex Modes

In regex there are a number of different modes which can be applied to control aspects of how patterns are matched, generally relating to how newlines are detected, and whether uppercase and lowercase characters are considered the same.

Using Modes

To apply a mode to a regex, you supply the Modes parameter when compiling the regex - either via the dedicated Compile action, or when using any of the functions or tags with a single-use pattern.

The parameter accepts a comma-delimited list of codes, to determine which modes to turn on. (The codes for each mode can be found detailed below.)

Flags

Modes can also be defined as part of a regex pattern itself, in the form of flags which enable or disable that mode. (The flags for each mode can be found detailed below.)

A flag can be turned on for the rest of the expression, by using (?flag), turned off by using (?-flag) and applied for only part of an expression by using (?flag:expression)

For example:

(?i)this is case insensitive
(?i)this is case insensitive (?-i) but this is not
(?i:this is case insensitive) but this is not
(?i:this is (?-i:except this part) case insensitive)

You can enable or disable multiple flags at once, like so:

(?is)case insensitive and single-line enabled
(?m-d:multi-line enabled, but unix lines disabled)

In cfRegex, all modes are disabled by default, with the exception of Comment mode which is enabled when using the cfregex tag (but disabled for functions).

Available Modes

Unix Lines

code: UNIX_LINES
flag: d

Tells the regex engine that only the newline character (\n) should be treated as part of a newline.

Carriage returns (\r) should not be considered a part of a newline.

This is significant when combined with Multiline mode and the ^ and $ markers are used to match start/end of lines.

Case Insensitivity

code: CASE_INSENSITIVE
flag: i

Means that uppercase and lowercase letters are not differentiated - i.e. "abc" and "ABC" are considered the same.

Comment

code: COMMENTS
flag: x

Enables commenting and free-spacing mode.

All whitespace is ignored unless preceeded by a backslash.

When a hash (#) is encountered, all content until the end of the line is ignored.

Multi-line

code: MULTILINE
flag: m

When enabled, the ^ and $ markers will match start and end of lines, as opposed to just start and end of input (which can be matched using \A and \z instead.)

Dot-All or Single-line

code: DOTALL
flag: s

By default, the . character matches everything except newlines.

The dot-all mode means . matches everything including newlines.

Unicode Case-insensitivity

code: UNICODE_CASE
flag: u

This mode is similar to Case-insensitivity, but whilst that applies only to standard characters, this will apply to unicode characters also.

Canonical Equivalence

code: CANON_EQ
flag: c

When this flag is enabled, two characters will be considered to match if their full canonical decompositions match.