Sorcerer's IsleDocs cfRegexOverviewCode

Match

The match action returns an array of matches, similar to the core rematch function available in CF8 and compatible engines, but with far more options available.

For example, if you only need to know the first match, you can set limit to 1 and it will stop matching after that match (instead of having to return all matches and then discarding the rest of them).

Match also supports a callback function, which allows you to execute CFML logic against every match, and decide whether it should be included in the results. This allowing more efficient filtering - for example, combined with limit it might be used to find the first match which meets certain criteria, whilst only needing to keep matching until a suitable one is found.

A callback function can receive a variety of information about a match, (and also accepts an arbitrary structure passed in using CallbackData), for full details of how to use callbacks, see the dedicated Callback page.

When you need to do more than return the text for each match, you can set ReturnType to return an array of groups found, a structure of named groups, or all three of these combined. The section below goes into details on this.

Since v0.4, when using a Regex object, there are shortcut methods allowing a non-default returntype to be suffixed onto the method name for cleaner syntax, e.g:

RegexObj.match( text=Input , returntype='groups' )      => RegexObj.matchGroups( Input )
RegexObj.match( text=Input , returntype='namedgroups' ) => RegexObj.matchNamedGroups( Input )
RegexObj.match( text=Input , returntype='full' )        => RegexObj.matchFull( Input )

Return Types

match

This returns a simple array of strings, containing the entire text matched by the regex. This is the default value is a return type is not set.

groups

This returns an array of arrays, which contain the text matched by each group captured by each match.

namedgroups

If you specify namedgroups for the returntype, you must either use native named groups in the regex pattern, or provide a list of names to be mapped to each group - this can either be a StringList or an Array of strings - and the result will then be an array of structs, with the numerical group matches mapped to the appropriate struct items.

If the number of groups provided exceeds the number of groups in the regex, the surplus groups are not included in the results. Similarly, if there are more groups than group names provided, only the ones named are returned.

If you need both numerical and named groups, use the full instead.

full

This combines the three returntypes above, to return an array of structs. Each struct has the keys match, groups, and namedgroups (if optional groupnames is provided, or if pattern contains named groups, via (?...) syntax).

Object

Arguments

Name Type Required Default Notes
Text String yes n/a The text to match the regex against.
Start Char Position no 1 Position at which to start attempting to match. (1 is first character.)
Limit Integer no 0 Number of times to match before stopping. (0 is unlimited.)
ReturnType Enum (match,groups, namedgroups,full) no "pos" Determines the structure of each array element in the return variable.
GroupNames StringList or Array no* none An array of names to label groups with. *Required for ReturnType namedgroups (unless Pattern uses ">named groups), optional for ReturnType full, ignored for other ReturnTypes.
Callback Function no none A function called each time a match is made. If function returns false the match is excluded from results (and does not count towards limit). See Callbacks section for full details on function signature and how to use this feature.
CallbackData Struct no none A structure which is passed into the callback function.

Usage Examples

<cfset Input = "The quick fox jumps over the lazy brown dog." />
<cfset ThreeLettersRx = new Regex('\b\w(\w)(\w)\b') />
<cfset FiveLettersRx = new Regex('\b\w(\w{4})\b') />

<cfdump var=#ThreeLettersRx.match( Input )# />
Outputs: ['The','fox','the','dog']

<cfdump var=#FiveLettersRx.match( Input )# />
Outputs: ['quick','jumps','brown']

<cfdump var=#ThreeLettersRx.match( Input , 5 , 1 )# />
Outputs: ['fox']

<cfdump var=#FiveLettersRx.match( Input , 5 , 1 )# />
Outputs: ['quick']

<cfdump var=#ThreeLettersRx.match( text=Input , limit = 2 , returntype='groups' )# />
Outputs:
    [ ['h','e']
    , ['o','x']
    ]

<cfdump var=#FiveLettersRx.match( text=Input , limit = 2 , returntype='groups' )# />
Outputs:
    [ ['uick']
    , ['umps']
    ]

<cfdump var=#ThreeLettersRx.match( text=Input , limit = 2 , returntype='namedgroups' , groupnames='first,second' )# />
Outputs:
    [ {first:'h',second:'e'}
    , {first:'o',second:'x'}
    ]

<cfdump var=#FiveLettersRx.match( text=Input , limit = 2 , returntype='namedgroups' , groupnames='first,second' )# />
Outputs:
    [ {first:'uick'}
    , {first:'umps'}
    ]

<cfdump var=#ThreeLettersRx.match( Input , 5 , 2 , 'full' )# />
Outputs:
    [ { match:'fox' , groups:[ 'o' , 'x' ] }
    , { match:'the' , groups:[ 'o' , 'x' ] }
    ]

<cfdump var=#FiveLettersRx.match( Input , 5 , 2 , 'full' )# />
Outputs:
    [ { match:'quick' , groups:['uick'] }
    , { match:'jumps' , groups:['umps'] }
    ]

Tag

Attributes

Name Type Required Default Notes
Variable VarName no "cfregex" The variable which the result is assigned to.
Text String yes n/a The text to match the regex against.
Start Char Position no 1 Position at which to start attempting to match. (1 is first character.)
Limit Integer no 0 Number of times to match before stopping. (0 is unlimited.)
ReturnType Enum (match,groups, namedgroups,full) no "pos" Determines the structure of each array element in the return variable.
GroupNames StringList or Array no* none An array of names to label groups with. *Required for ReturnType namedgroups (unless Pattern uses ">named groups), optional for ReturnType full, ignored for other ReturnTypes.
Callback Function no none A function called each time a match is made. If function returns false the match is excluded from results (and does not count towards limit). See Callbacks section for full details on function signature and how to use this feature.
CallbackData Struct no none A structure which is passed into the callback function.
Modes StringList no none List of regex modes to apply to the pattern.

Usage Examples

<cfset Input = "The quick fox jumps over the lazy brown dog." />

<cfregex match variable="ThreeLetterWords" text=#Input# >
    \b\w(\w)(\w)\b
</cfregex>
<cfdump var=#ThreeLetterWords# />
Outputs: ['The','fox','the','dog']

<cfregex match variable="FiveLetterWords" text=#Input# >
    \b\w(\w{4})\b
</cfregex>
<cfdump var=#FiveLetterWords# />
Outputs: ['quick','jumps','brown']

<cfregex match variable="FirstThreeLetterWordFrom5thChar" text=#Input# start=5 limit=1 >
    \b\w(\w)(\w)\b
</cfregex>
<cfdump var=#FirstThreeLetterWordFrom5thChar# />
Outputs: ['fox']

<cfregex match variable="FirstFiveLetterWordFrom5thChar" text=#Input# start=5 limit=1 >
    \b\w(\w{4})\b
</cfregex>
<cfdump var=#FirstFiveLetterWordFrom5thChar# />
Outputs: ['quick']

<cfregex match variable="ThreeLetterGroups" text=#Input# limit=2 returntype="groups" >
    \b\w(\w)(\w)\b
</cfregex>
<cfdump var=#ThreeLetterGroups# />
Outputs:
    [ ['h','e']
    , ['o','x']
    ]

<cfregex match variable="FiveLetterGroups" text=#Input# limit=2 returntype="groups" >
    \b\w(\w{4})\b
</cfregex>
<cfdump var=#FiveLetterGroups# />
Outputs:
    [ ['uick']
    , ['umps']
    ]

<cfregex match variable="ThreeLetterNamedGroups" text=#Input# limit=2 returntype="namedgroups" groupnames="first,second" >
    \b\w(\w)(\w)\b
</cfregex>
<cfdump var=#ThreeLetterNamedGroups# />
Outputs:
    [ {first:'h',second:'e'}
    , {first:'o',second:'x'}
    ]

<cfregex match variable="FiveLetterNamedGroups" text=#Input# limit=2 returntype="namedgroups" groupnames="first,second" >
    \b\w(\w{4})\b
</cfregex>
<cfdump var=#FiveLetterNamedGroups# />
Outputs:
    [ {first:'uick'}
    , {first:'umps'}
    ]

<cfregex match variable="ThreeLetterFullInfo" text=#Input# start=5 limit=2 returntype="full" >
    \b\w(\w)(\w)\b
</cfregex>
<cfdump var=#ThreeLetterFullInfo# />
Outputs:
    [ { match:'fox' , groups:[ 'o' , 'x' ] }
    , { match:'the' , groups:[ 'o' , 'x' ] }
    ]

<cfregex match variable="FiveLetterFullInfo" text=#Input# start=5 limit=2 returntype="full" >
    \b\w(\w{4})\b
</cfregex>
<cfdump var=#FiveLetterFullInfo# />
Outputs:
    [ { match:'quick' , groups:['uick'] }
    , { match:'jumps' , groups:['umps'] }
    ]

Function

Arguments

Name Type Required Default Notes
Pattern RegexString yes n/a The regex pattern to compile into a Regex Object.
Text String yes n/a The text to match the regex against.
Start Char Position no 1 Position at which to start attempting to match. (1 is first character.)
Limit Integer no 0 Number of times to match before stopping. (0 is unlimited.)
ReturnType Enum (match,groups, namedgroups,full) no "pos" Determines the structure of each array element in the return variable.
GroupNames StringList or Array no* none An array of names to label groups with. *Required for ReturnType namedgroups (unless Pattern uses named groups), optional for ReturnType full, ignored for other ReturnTypes.
Callback Function no none A function called each time a match is made. If function returns false the match is excluded from results (and does not count towards limit). See Callbacks section for full details on function signature and how to use this feature.
CallbackData Struct no none A structure which is passed into the callback function.
Modes StringList no none List of regex modes to apply to the pattern.

Usage Examples

<cfset Input = "The quick fox jumps over the lazy brown dog." />

<cfdump var=#RegexMatch( '\b\w(\w)(\w)\b' , Input )# />
Outputs: ['The','fox','the','dog']

<cfdump var=#RegexMatch( '\b\w(\w{4})\b' , Input )# />
Outputs: ['quick','jumps','brown']

<cfdump var=#RegexMatch( '\b\w(\w)(\w)\b' , Input , 5 , 1 )# />
Outputs: ['fox']

<cfdump var=#RegexMatch( '\b\w(\w{4})\b' , Input , 5 , 1 )# />
Outputs: ['quick']

<cfdump var=#RegexMatch( pattern='\b\w(\w)(\w)\b' , text=Input , limit = 2 , returntype='groups' )# />
Outputs:
    [ ['h','e']
    , ['o','x']
    ]

<cfdump var=#RegexMatch( '\b\w(\w{4})\b' , text=Input , limit = 2 , returntype='groups' )# />
Outputs:
    [ ['uick']
    , ['umps']
    ]

<cfdump var=#RegexMatch( pattern='\b\w(\w)(\w)\b' , text=Input , limit = 2 , returntype='namedgroups' , groupnames='first,second' )# />
Outputs:
    [ {first:'h',second:'e'}
    , {first:'o',second:'x'}
    ]

<cfdump var=#RegexMatch( pattern='\b\w(\w{4})\b' , text=Input , limit = 2 , returntype='namedgroups' , groupnames='first,second' )# />
Outputs:
    [ {first:'uick'}
    , {first:'umps'}
    ]

<cfdump var=#RegexMatch( '\b\w(\w)(\w)\b' , Input , 5 , 2 , 'full' )# />
Outputs:
    [ { match:'fox' , groups:[ 'o' , 'x' ] }
    , { match:'the' , groups:[ 'o' , 'x' ] }
    ]

<cfdump var=#RegexMatch( '\b\w(\w{4})\b' , Input , 5 , 2 , 'full' )# />
Outputs:
    [ { match:'quick' , groups:['uick'] }
    , { match:'jumps' , groups:['umps'] }
    ]

Practical Examples

Whilst the Usage Examples above give examples of how the different options of Match can be used, this section gives practical situations to show why Match might be used.

Example 1

Obtain all IP addresses found in the input text:

<cfset Ip4Addresses = RegexMatch( '\b[12]?\d\d(?:\.[12]?\d\d){3}\b' ) />

Example 2

Locate "TODO" tasks in CFML files:

<cfset TodoRx = new Regex('<!---\s*TODO:[^-]++(?s:(?!--->).)*--->|//\s*TODO:[^\n]+') />
<cfset Todos = StructNew() />

<cfdirectory name="ProjFiles" recursive directory="/project" />
<cfloop query="ProjFiles">
    <cfif ProjFiles.Type NEQ 'File'><cfcontinue/></cfif>
    <cfset CurFilename = ProjFiles.Directory & ProjFiles.Name />

    <cfset Todos[CurFilename] = TodoRx.match( FileRead(CurFilename) ) />
</cfloop>

Example 3

Locate all CSS colour codes:

<cfregex match
    variable = "colours"
    input    = #CssCode#
    modes    = "case_insensitive"
    >
    <!--- #888888 --->
    \#[A-F0-9]{6}(?=\s*[;"'}])
    |
    <!--- #888 --->
    \#[A-F0-9]{3}(?=\s*[;"'}])
    |
    <!--- rgb(128,128,128) and rgba(128,128,128,0.5) --->
    \brgba?\s*\([^)]+\)(?=\s*[;"'}])
    |
    <!--- hsl(128,50%,50%) and hsla(128,50%,50%,0.5) --->
    \bhsla?\s*\([^)]+\)(?=\s*[;"'}])
</cfregex>

Example 4

Using match to extract function and argument code from a CFC. The entire match text is not required, only the text captured in the four groups, so returntype groups is used.

<cfregex
    action     = "match"
    variable   = "Funcs"
    text       = "#Arguments.ComponentCode#"
    returntype = "groups"
    >

    ## 1 :  space to hide tag from cfml compiler (unescaped whitespace is ignored)
    (< cffunction\ name=")

    ## 2 : match any name, excluding compile (handled separately)
    ((?!compile)[^"]++)

    ## 3 : remaining attribute text
    ("[^>]+)

    ## take unnecessary attributes out of group
    access="public"\ action>

    ## 4 : grab all the arguments (again, space to hide tag from compiler)
    ((?:
        \n\t+< cfargument.*?/>
    )*+)
</cfregex>