Sorcerer's IsleDocs cfRegexOverviewCode

Encoded Characters

There are some characters which can be inconvenient, or even impossible, to represent natively in regex. To allow regex to still easily match text involving these characters, there are metacharacters which represent these characters in encoded form.

The most common of these are newline, carriage return, and tab characters. Whilst all of these can usually be included as literal characters, it is generally more readable to use their encoded form:

For any other character, there are three ways to represent them, ASCII Hexadecimal, Unicode Hexadecimal, or Octal.

(Although it is available, Octal encoding is not commonly used or known about, and offers no advantages, so it is recommended not to use this.)

ASCII

To encode a character using ASCII values, simply prefix the two-digit hex character code with backslash-x "\x".

For example, the character "•" is ASCII decimal 149, which is 95 in hexadecimal, so to use in a regex you would do "\x95".

Unicode

To encose a character using Unicode values, use backslash-u "\u" followed by the four-digit hex character code.

For example, the unicode character "☺" has the code 263A, so encoded in a regex pattern it would look like "\u263A".

Octal

To encode a character with Octal, you use backslash-zero "\0" as the prefix, followed by a value between 0 and 377 (255 in decimal).

For example, the character "•" is 225 in octal, so is encoded as "\0225" in a regex.