Sorcerer's IsleDocs cfRegexOverviewCode

Find

The find action is similar to the core refind function of CFML, but much more flexible. You are not forced to call the function multiple times to find multiple matches - you simply use the Limit argument if you need to stop after a certain number of matches have been made.

If the expression is not found, the result is always an empty array. If it does find the expression, the results depend on the returntype attribute, but will always be an array with each element referring to each of the matches.

If you are simply checking whether a regex matches or not, and do not need to know the information about which character position it matches at, use the Matches action, which is more efficient and returns a boolean true/false.

If you only need to know the text of a match (not it's position), you can use the Match action, which can return a simple array of strings.

Since v0.4, when using a Regex object, there are shortcut methods allowing a returntype to be suffixed onto the method name for cleaner syntax, e.g:

RegexObj.find( text=Input , returntype='pos' )  => RegexObj.findPos( Input )
RegexObj.find( text=Input , returntype='sub' )  => RegexObj.findSub( Input )
RegexObj.find( text=Input , returntype='info' ) => RegexObj.findInfo( Input )

Return Types

There are three possible structures which the find action can return, which one to use depends on how much information you need.

For all three return types, if no match is found, an empty array is returned.

pos

This returns a simple array of the character positions at the start of each match. This is the default value is a return type is not set.

sub

This returns an array of structures, with each structure containing two keys, pos and len, indiciating the character position and the length of the match. Each structure is an array, the first element of which is the overall match, whilst the rest of the elements relate to the groups.

(This is akin to setting returnsubexpressions to true with refind.)

info

This returns an array of structures, containing complete information for each match. Keys pos, len and match return character position, length of match, and the match text respectively. Key group contains an array, each array element is a structure representing all groups found, with the same pos, len and match keys.

Object

Arguments

Name Type Required Default Notes
Text String yes n/a The text to find the regex within.
Start Char Position no 1 Position at which to start trying to find the regex. (1 is first character.)
Limit Integer no 0 Number of times to find the regex before stopping. (0 is unlimited.)
ReturnType Enum (pos,sub,info) no "pos" Determines the structure of each array element in the return variable.

Usage Examples

<cfset Input = "The quick fox jumps over the lazy brown dog." />
<cfset ThreeLettersRx = new Regex('\b\w(\w)(\w)\b') />
<cfset FiveLettersRx = new Regex('\b\w(\w{4})\b') />

<cfdump var=#ThreeLettersRx.find( Input )# />
Outputs: [1,11,26,41]

<cfdump var=#FiveLettersRx.find( Input )# />
Outputs: [5,15,35]

<cfdump var=#ThreeLettersRx.find( Input , 5 , 1 )# />
Outputs: [11]

<cfdump var=#FiveLettersRx.find( Input , 5 , 1 )# />
Outputs: [5]

<cfdump var=#ThreeLettersRx.find( text=Input , limit = 2 , returntype='sub' )# />
Outputs:
    [ { pos:[1,2,3]    , len:[3,1,1] }
    , { pos:[11,12,13] , len:[3,1,1] }
    ]

<cfdump var=#FiveLettersRx.find( text=Input , limit = 2 , returntype='sub' )# />
Outputs:
    [ { pos:[5,6]   , len:[5,4] }
    , { pos:[15,16] , len:[5,4] }
    ]

<cfdump var=#ThreeLettersRx.find( Input , 5 , 2 , 'info' )# />
Outputs:
    [ { pos:11 , len:3 , match:'fox' , groups:[ {pos:12,len:1,match:'o'} , {pos:13,len:1,match:'x'} ] }
    , { pos:26 , len:3 , match:'the' , groups:[ {pos:27,len:1,match:'h'} , {pos:28,len:1,match:'e'} ] }
    ]

<cfdump var=#FiveLettersRx.find( Input , 5 , 2 , 'info' )# />
Outputs:
    [ { pos:5  , len:5 , match:'quick' , groups:[{pos:6 ,len:4,match:'uick'}] }
    , { pos:15 , len:5 , match:'jumps' , groups:[{pos:16,len:4,match:'umps'}] }
    ]

Tag

Attributes

Name Type Required Default Notes
Variable VarName no "cfregex" The variable which the array is assigned to.
Text String yes n/a The text to find the regex within.
Start Char Position no 1 Position at which to start trying to find the regex. (1 is first character.)
Limit Integer no 0 Number of times to find the regex before stopping. (0 is unlimited.)
ReturnType Enum (pos,sub,info) no "pos" Determines the structure of each array element in the return variable.
Modes StringList no none List of regex modes to apply to the pattern.

Usage Examples

<cfset Input = "The quick fox jumps over the lazy brown dog." />

<cfregex find variable="WordPositions" text=#Input#>
    \b\w(\w)(\w)\b
</cfregex>
<dump var=#WordPositions#/>
Outputs: [1,11,26,41]

<cfregex find variable="WordPositions" text=#Input#>
    \b\w(\w{4})\b
</cfregex>
<dump var=#WordPositions#/>
Outputs: [5,15,35]

<cfregex find variable="WordPositions" text=#Input# start=5 limit=1 >
    \b\w(\w)(\w)\b
</cfregex>
<dump var=#WordPositions#/>
Outputs: [11]

<cfregex find variable="WordPositions" text=#Input# start=5 limit=1 >
    \b\w(\w{4})\b
</cfregex>
<dump var=#WordPositions#/>
Outputs: [5]

<cfregex find variable="WordPositions" text=#Input# limit=2 returntype="sub" >
    \b\w(\w)(\w)\b
</cfregex>
<dump var=#WordPositions#/>
Outputs:
    [ { pos:[1,2,3]    , len:[3,1,1] }
    , { pos:[11,12,13] , len:[3,1,1] }
    ]

<cfregex find variable="WordPositions" text=#Input# limit=2 returntype="sub" >
    \b\w(\w{4})\b
</cfregex>
<dump var=#WordPositions#/>
Outputs:
    [ { pos:[5,6]   , len:[5,4] }
    , { pos:[15,16] , len:[5,4] }
    ]

<cfregex find variable="WordPositions" text=#Input# start=5 limit=2 returntype="info" >
    \b\w(\w)(\w)\b
</cfregex>
<dump var=#WordPositions#/>
Outputs:
    [ { pos:11 , len:3 , match:'fox' , groups:[ {pos:12,len:1,match:'o'} , {pos:13,len:1,match:'x'} ] }
    , { pos:26 , len:3 , match:'the' , groups:[ {pos:27,len:1,match:'h'} , {pos:28,len:1,match:'e'} ] }
    ]

<cfregex find variable="WordPositions" text=#Input# start=5 limit=2 returntype="info" >
    \b\w(\w{4})\b
</cfregex>
<dump var=#WordPositions#/>
Outputs:
    [ { pos:5  , len:5 , match:'quick' , groups:[{pos:6 ,len:4,match:'uick'}] }
    , { pos:15 , len:5 , match:'jumps' , groups:[{pos:16,len:4,match:'umps'}] }
    ]

Function

Arguments

Name Type Required Default Notes
Pattern RegexString yes n/a The regex pattern to compile into a Regex Object.
Text String yes n/a The text to find the regex within.
Start Char Position no 1 Position at which to start trying to find the regex. (1 is first character.)
Limit Integer no 0 Number of times to find the regex before stopping. (0 is unlimited.)
ReturnType Enum (pos,sub,info) no "pos" Determines the structure of each array element in the return variable.
Modes StringList no none List of regex modes to apply to the pattern.

Usage Examples

<cfset Input = "The quick fox jumps over the lazy brown dog." />

<cfdump var=#RegexFind( '\b\w(\w)(\w)\b' , Input )# />
Outputs: [1,11,26,41]

<cfdump var=#RegexFind( '\b\w(\w{4})\b' , Input )# />
Outputs: [5,15,35]

<cfdump var=#RegexFind( '\b\w(\w)(\w)\b' , Input , 5 , 1 )# />
Outputs: [11]

<cfdump var=#RegexFind( '\b\w(\w{4})\b' , Input , 5 , 1 )# />
Outputs: [5]

<cfdump var=#RegexFind( pattern='\b\w(\w)(\w)\b' , text=Input , limit = 2 , returntype='sub' )# />
Outputs:
    [ { pos:[1,2,3]   , len:[3,1,1] }
    , { pos:[11,12,13] , len:[3,1,1] }
    ]

<cfdump var=#RegexFind( pattern='\b\w(\w{4})\b' , text=Input , limit = 2 , returntype='sub' )# />
Outputs:
    [ { pos:[5,6]   , len:[5,4] }
    , { pos:[15,16] , len:[5,4] }
    ]

<cfdump var=#RegexFind( '\b\w(\w)(\w)\b' , Input , 5 , 2 , 'info' )# />
Outputs:
    [ { pos:11 , len:3 , match:'fox' , groups:[ {pos:12,len:1,match:'o'} , {pos:13,len:1,match:'x'} ] }
    , { pos:26 , len:3 , match:'the' , groups:[ {pos:27,len:1,match:'h'} , {pos:28,len:1,match:'e'} ] }
    ]

<cfdump var=#RegexFind( '\b\w(\w{4})\b' , Input , 5 , 2 , 'info' )# />
Outputs:
    [ { pos:5  , len:5 , match:'quick' , groups:[{pos:6 ,len:4,match:'uick'}] }
    , { pos:15 , len:5 , match:'jumps' , groups:[{pos:16,len:4,match:'umps'}] }
    ]