The match
action returns an array of matches, similar to the core rematch
function available in CF8 and compatible engines, but with far more options
available.
For example, if you only need to know the first match, you can set limit
to
1 and it will stop matching after that match (instead of having to return all
matches and then discarding the rest of them).
Match also supports a callback function, which allows you to execute CFML
logic against every match, and decide whether it should be included in the
results. This allowing more efficient filtering - for example, combined with
limit
it might be used to find the first match which meets certain criteria,
whilst only needing to keep matching until a suitable one is found.
A callback function can receive a variety of information about a match,
(and also accepts an arbitrary structure passed in using CallbackData
),
for full details of how to use callbacks, see the dedicated Callback page.
When you need to do more than return the text for each match, you can set
ReturnType
to return an array of groups found, a structure of named groups,
or all three of these combined. The section below goes into details on this.
Since v0.4, when using a Regex object, there are shortcut methods allowing a non-default returntype to be suffixed onto the method name for cleaner syntax, e.g:
RegexObj.match( text=Input , returntype='groups' ) => RegexObj.matchGroups( Input )
RegexObj.match( text=Input , returntype='namedgroups' ) => RegexObj.matchNamedGroups( Input )
RegexObj.match( text=Input , returntype='full' ) => RegexObj.matchFull( Input )
This returns a simple array of strings, containing the entire text matched by the regex. This is the default value is a return type is not set.
This returns an array of arrays, which contain the text matched by each group captured by each match.
If you specify namedgroups
for the returntype
, you must either
use native named groups in the regex pattern,
or provide a list of names to be mapped to each group - this can either be a StringList or an
Array of strings - and the result will then be an array of structs, with the
numerical group matches mapped to the appropriate struct items.
If the number of groups provided exceeds the number of groups in the regex, the surplus groups are not included in the results. Similarly, if there are more groups than group names provided, only the ones named are returned.
If you need both numerical and named groups, use the full
instead.
This combines the three returntypes above, to return an array of structs.
Each struct has the keys match
, groups
, and namedgroups
(if
optional groupnames
is provided, or if pattern contains named groups, via (?
syntax).
Name | Type | Required | Default | Notes |
---|---|---|---|---|
Text | String | yes | n/a | The text to match the regex against. |
Start | Char Position | no | 1 | Position at which to start attempting to match. (1 is first character.) |
Limit | Integer | no | 0 | Number of times to match before stopping. (0 is unlimited.) |
ReturnType | Enum (match,groups, namedgroups,full) | no | "pos" | Determines the structure of each array element in the return variable. |
GroupNames | StringList or Array | no* | none | An array of names to label groups with. *Required for ReturnType namedgroups (unless Pattern uses ">named groups), optional for ReturnType full, ignored for other ReturnTypes. |
Callback | Function | no | none | A function called each time a match is made. If function returns false the match is excluded from results (and does not count towards limit). See Callbacks section for full details on function signature and how to use this feature. |
CallbackData | Struct | no | none | A structure which is passed into the callback function. |
<cfset Input = "The quick fox jumps over the lazy brown dog." /> <cfset ThreeLettersRx = new Regex('\b\w(\w)(\w)\b') /> <cfset FiveLettersRx = new Regex('\b\w(\w{4})\b') />
<cfdump var=#ThreeLettersRx.match( Input )# />
<cfdump var=#FiveLettersRx.match( Input )# />
<cfdump var=#ThreeLettersRx.match( Input , 5 , 1 )# />
<cfdump var=#FiveLettersRx.match( Input , 5 , 1 )# />
<cfdump var=#ThreeLettersRx.match( text=Input , limit = 2 , returntype='groups' )# />
<cfdump var=#FiveLettersRx.match( text=Input , limit = 2 , returntype='groups' )# />
<cfdump var=#ThreeLettersRx.match( text=Input , limit = 2 , returntype='namedgroups' , groupnames='first,second' )# />
<cfdump var=#FiveLettersRx.match( text=Input , limit = 2 , returntype='namedgroups' , groupnames='first,second' )# />
<cfdump var=#ThreeLettersRx.match( Input , 5 , 2 , 'full' )# />
<cfdump var=#FiveLettersRx.match( Input , 5 , 2 , 'full' )# />
Name | Type | Required | Default | Notes |
---|---|---|---|---|
Variable | VarName | no | "cfregex" | The variable which the result is assigned to. |
Text | String | yes | n/a | The text to match the regex against. |
Start | Char Position | no | 1 | Position at which to start attempting to match. (1 is first character.) |
Limit | Integer | no | 0 | Number of times to match before stopping. (0 is unlimited.) |
ReturnType | Enum (match,groups, namedgroups,full) | no | "pos" | Determines the structure of each array element in the return variable. |
GroupNames | StringList or Array | no* | none | An array of names to label groups with. *Required for ReturnType namedgroups (unless Pattern uses ">named groups), optional for ReturnType full, ignored for other ReturnTypes. |
Callback | Function | no | none | A function called each time a match is made. If function returns false the match is excluded from results (and does not count towards limit). See Callbacks section for full details on function signature and how to use this feature. |
CallbackData | Struct | no | none | A structure which is passed into the callback function. |
Modes | StringList | no | none | List of regex modes to apply to the pattern. |
<cfset Input = "The quick fox jumps over the lazy brown dog." />
<cfregex match variable="ThreeLetterWords" text=#Input# > \b\w(\w)(\w)\b </cfregex> <cfdump var=#ThreeLetterWords# />
<cfregex match variable="FiveLetterWords" text=#Input# > \b\w(\w{4})\b </cfregex> <cfdump var=#FiveLetterWords# />
<cfregex match variable="FirstThreeLetterWordFrom5thChar" text=#Input# start=5 limit=1 > \b\w(\w)(\w)\b </cfregex> <cfdump var=#FirstThreeLetterWordFrom5thChar# />
<cfregex match variable="FirstFiveLetterWordFrom5thChar" text=#Input# start=5 limit=1 > \b\w(\w{4})\b </cfregex> <cfdump var=#FirstFiveLetterWordFrom5thChar# />
<cfregex match variable="ThreeLetterGroups" text=#Input# limit=2 returntype="groups" > \b\w(\w)(\w)\b </cfregex> <cfdump var=#ThreeLetterGroups# />
<cfregex match variable="FiveLetterGroups" text=#Input# limit=2 returntype="groups" > \b\w(\w{4})\b </cfregex> <cfdump var=#FiveLetterGroups# />
<cfregex match variable="ThreeLetterNamedGroups" text=#Input# limit=2 returntype="namedgroups" groupnames="first,second" > \b\w(\w)(\w)\b </cfregex> <cfdump var=#ThreeLetterNamedGroups# />
<cfregex match variable="FiveLetterNamedGroups" text=#Input# limit=2 returntype="namedgroups" groupnames="first,second" > \b\w(\w{4})\b </cfregex> <cfdump var=#FiveLetterNamedGroups# />
<cfregex match variable="ThreeLetterFullInfo" text=#Input# start=5 limit=2 returntype="full" > \b\w(\w)(\w)\b </cfregex> <cfdump var=#ThreeLetterFullInfo# />
<cfregex match variable="FiveLetterFullInfo" text=#Input# start=5 limit=2 returntype="full" > \b\w(\w{4})\b </cfregex> <cfdump var=#FiveLetterFullInfo# />
Name | Type | Required | Default | Notes |
---|---|---|---|---|
Pattern | RegexString | yes | n/a | The regex pattern to compile into a Regex Object. |
Text | String | yes | n/a | The text to match the regex against. |
Start | Char Position | no | 1 | Position at which to start attempting to match. (1 is first character.) |
Limit | Integer | no | 0 | Number of times to match before stopping. (0 is unlimited.) |
ReturnType | Enum (match,groups, namedgroups,full) | no | "pos" | Determines the structure of each array element in the return variable. |
GroupNames | StringList or Array | no* | none | An array of names to label groups with. *Required for ReturnType namedgroups (unless Pattern uses named groups), optional for ReturnType full, ignored for other ReturnTypes. |
Callback | Function | no | none | A function called each time a match is made. If function returns false the match is excluded from results (and does not count towards limit). See Callbacks section for full details on function signature and how to use this feature. |
CallbackData | Struct | no | none | A structure which is passed into the callback function. |
Modes | StringList | no | none | List of regex modes to apply to the pattern. |
<cfset Input = "The quick fox jumps over the lazy brown dog." />
<cfdump var=#RegexMatch( '\b\w(\w)(\w)\b' , Input )# />
<cfdump var=#RegexMatch( '\b\w(\w{4})\b' , Input )# />
<cfdump var=#RegexMatch( '\b\w(\w)(\w)\b' , Input , 5 , 1 )# />
<cfdump var=#RegexMatch( '\b\w(\w{4})\b' , Input , 5 , 1 )# />
<cfdump var=#RegexMatch( pattern='\b\w(\w)(\w)\b' , text=Input , limit = 2 , returntype='groups' )# />
<cfdump var=#RegexMatch( '\b\w(\w{4})\b' , text=Input , limit = 2 , returntype='groups' )# />
<cfdump var=#RegexMatch( pattern='\b\w(\w)(\w)\b' , text=Input , limit = 2 , returntype='namedgroups' , groupnames='first,second' )# />
<cfdump var=#RegexMatch( pattern='\b\w(\w{4})\b' , text=Input , limit = 2 , returntype='namedgroups' , groupnames='first,second' )# />
<cfdump var=#RegexMatch( '\b\w(\w)(\w)\b' , Input , 5 , 2 , 'full' )# />
<cfdump var=#RegexMatch( '\b\w(\w{4})\b' , Input , 5 , 2 , 'full' )# />
Whilst the Usage Examples above give examples of how the different options of Match can be used, this section gives practical situations to show why Match might be used.
Obtain all IP addresses found in the input text:
<cfset Ip4Addresses = RegexMatch( '\b[12]?\d\d(?:\.[12]?\d\d){3}\b' ) />
Locate "TODO" tasks in CFML files:
<cfset TodoRx = new Regex('<!---\s*TODO:[^-]++(?s:(?!--->).)*--->|//\s*TODO:[^\n]+') />
<cfset Todos = StructNew() />
<cfdirectory name="ProjFiles" recursive directory="/project" />
<cfloop query="ProjFiles">
<cfif ProjFiles.Type NEQ 'File'><cfcontinue/></cfif>
<cfset CurFilename = ProjFiles.Directory & ProjFiles.Name />
<cfset Todos[CurFilename] = TodoRx.match( FileRead(CurFilename) ) />
</cfloop>
Locate all CSS colour codes:
<cfregex match
variable = "colours"
input = #CssCode#
modes = "case_insensitive"
>
<!--- #888888 --->
\#[A-F0-9]{6}(?=\s*[;"'}])
|
<!--- #888 --->
\#[A-F0-9]{3}(?=\s*[;"'}])
|
<!--- rgb(128,128,128) and rgba(128,128,128,0.5) --->
\brgba?\s*\([^)]+\)(?=\s*[;"'}])
|
<!--- hsl(128,50%,50%) and hsla(128,50%,50%,0.5) --->
\bhsla?\s*\([^)]+\)(?=\s*[;"'}])
</cfregex>
Using match
to extract function and argument code from a CFC. The entire
match text is not required, only the text captured in the four groups, so
returntype groups
is used.
<cfregex
action = "match"
variable = "Funcs"
text = "#Arguments.ComponentCode#"
returntype = "groups"
>
## 1 : space to hide tag from cfml compiler (unescaped whitespace is ignored)
(< cffunction\ name=")
## 2 : match any name, excluding compile (handled separately)
((?!compile)[^"]++)
## 3 : remaining attribute text
("[^>]+)
## take unnecessary attributes out of group
access="public"\ action>
## 4 : grab all the arguments (again, space to hide tag from compiler)
((?:
\n\t+< cfargument.*?/>
)*+)
</cfregex>