A quantifier is something that tells the regex engine to repeat the previous item a number (quantity) of times, according to one of three behaviours (greedy, lazy, or possessive).
A numeric quantifier is delimited by {
and }
metacharacters, and can repeat
an exact number of times, a minimum number, or a range of repetitions.
{a}
" where "a" is any positive integer.{a,}
"{a,b}
" where "b" is a positive integer greater than "a".In regex, there are a number of quantifiers used commonly enough to justify having their own shorthand notation.
?
" instead of "{0,1}
"*
" instead of "{0,}
"+
" instead of "{1,}
"Whilst these shorthands are useful, it is important not to get carried away and over-use them. If there is a known range of repetition it may well result in greater readability and/or performance to explicitly specify the range.
The default behaviour for a quantifier is to start by matching as much as possible, and only to relinquish characters if required to do so by backtracking (when the following instruction is unable to match).
Since greedy is the default behaviour, all quantifiers are greedy unless they are converted to lazy or possessive.
Whilst the traditional way of explaining quantifiers, for example "+
", is to
say "match one or more times", it better represents the behaviour of greedy
quantifiers to say "match as many as possible, at least once".
If a quantifier is suffixed with ?
then it becomes a lazy quantifier which
will match as little as required, and only add more characters to its match if
required to do so (again, because the following instruction does not yet match).
Lazy quantifiers are "??
", "*?
", "+?
", "{a}?
", "{a,}?
", "{a,b}?
".
Whilst lazy quantifiers can be useful, a lot of times it is better to use a
greedy (or possessive) quantifier combined with a negative character class,
e.g. "[^x]+x
" instead of ".+?x
", but this will depend on exactly what is
being matched.
A quantifier can be suffixed with +
to make it posessive. This is identical to
a greedy quantifier, except it will not backtrack within itself - even if to do
so would allow the overall expression to match, a possessive quantifier is all
or nothing.
Possessivenes only make sense for quantifiers that match a variable number of
characters. If matching an exact number (e.g. {4}
), or if matching one or zero
(i.e. ?
) there is no difference between possessive and greedy.
Possessive quantifiers are "*+
", "++
", "{a,}+
", "{a,b}+
".
Whilst incorrect use of possessive quantifiers can cause an expression not to match (and it might not be immediately obvious why), they are also an important feature which can help to improve performance by preventing unwanted backtracking.