Sorcerer's IsleDocs cfRegexOverviewCode

Quantifiers

A quantifier is something that tells the regex engine to repeat the previous item a number (quantity) of times, according to one of three behaviours (greedy, lazy, or possessive).

Numeric Quantifiers

A numeric quantifier is delimited by { and } metacharacters, and can repeat an exact number of times, a minimum number, or a range of repetitions.

Shorthand Quantifiers

In regex, there are a number of quantifiers used commonly enough to justify having their own shorthand notation.

Whilst these shorthands are useful, it is important not to get carried away and over-use them. If there is a known range of repetition it may well result in greater readability and/or performance to explicitly specify the range.

Behaviours

Greedy

The default behaviour for a quantifier is to start by matching as much as possible, and only to relinquish characters if required to do so by backtracking (when the following instruction is unable to match).

Since greedy is the default behaviour, all quantifiers are greedy unless they are converted to lazy or possessive.

Whilst the traditional way of explaining quantifiers, for example "+", is to say "match one or more times", it better represents the behaviour of greedy quantifiers to say "match as many as possible, at least once".

Lazy

If a quantifier is suffixed with ? then it becomes a lazy quantifier which will match as little as required, and only add more characters to its match if required to do so (again, because the following instruction does not yet match).

Lazy quantifiers are "??", "*?", "+?", "{a}?", "{a,}?", "{a,b}?".

Whilst lazy quantifiers can be useful, a lot of times it is better to use a greedy (or possessive) quantifier combined with a negative character class, e.g. "[^x]+x" instead of ".+?x", but this will depend on exactly what is being matched.

Possessive

A quantifier can be suffixed with + to make it posessive. This is identical to a greedy quantifier, except it will not backtrack within itself - even if to do so would allow the overall expression to match, a possessive quantifier is all or nothing.

Possessivenes only make sense for quantifiers that match a variable number of characters. If matching an exact number (e.g. {4}), or if matching one or zero (i.e. ?) there is no difference between possessive and greedy.

Possessive quantifiers are "*+", "++", "{a,}+", "{a,b}+".

Whilst incorrect use of possessive quantifiers can cause an expression not to match (and it might not be immediately obvious why), they are also an important feature which can help to improve performance by preventing unwanted backtracking.