Groups are useful when you have a sub-expression that should be treated as a single unit, so that it can either be captured or repeated.
There are three different types of groups: capturing group, non-capturing group, and atomic group.
A capturing group is where the contents of a group are stored, and can be used as a backreference within the expression, or returned to be acted upon outside of the regex (such as in a replacement string or function).
To create a capturing group, simply enclose the sub-expression with parentheses:
Capturing groups can be nested - their capture number is counted based on the
position of their opening parenthesis, and captured content includes that of any
enclosed groups. That is, "
(a(b)(c))((d)e)" results in the five captured
values of "abc","b","c","de","d".
When you want to refer to the value of a captured group, you use what is known
as a backreference, which is the group number preceeded by a backslash. So for
group 1 you do
\1, for group 2 you do
\2 and so on. It is possible to have
over a hundred groups, but it is not recommended to actually use this many -
if you have a regex with more than a dozen captured groups then you should
consider if there is a better way to do whatever you are doing.
It is important to remember that a backreference is equivalent to the literal
text which was captured by the group, not the instructions within in. (For
([abc])\1" will match "aa" or "bb" or "cc", but not "ab" or
When you do not need the value of a group, but simply want to act upon it as a single item, you should use a non-capturing group.
You can also combine a non-capturing group with a mode flag, to apply a particular regex mode only to the expression within the group.
For example, if there is a place you need dot to match newline, but not for the
whole expression, then "
(?s:.)" could be used.
Alternatively, you might have an expression which is case-insensitive, expect
for one small part, "
(?-i:CASE IMPORTANT)" is a way to do that.
Non-capturing groups with flags can still also be used for repetition.
Atomic groups are also non-capturing but they go a step further than simply treating a sub-expression as a single item - they prevent the regex engine from backtracking inside the group (whilst a normal non-atomic group allows backtracking to re-evaluate its contents).
This is an advanced feature that can help improve performance, but you should fully understand what backtracking is - when you want it and when you don't - before attempting to use atomic groups.