Publication date: 12/18/2019

Regex Match

Regex Match() returns an empty list with zero elements if the match fails. If the match succeeds, the first list is the text of the entire match (backreference 0). The second list is the text that matches backreference 1, and so on.

Regex Match(source, pattern, <NULL>, <MATCHCASE>);

Unlike Regex(), Regex Match() is case insensitive. Include MATCHCASE for a case-sensitive match. Include NULL if you want to match case but there is no replacement text.

Example of Parsing Name-Value Pairs

The following example parses pairs of names and values.

Regex Match(
	"person=Fred id=77 friend= favorite=tea",
	"(\w+)=(\S*) (\w+)=(\S*) (\w+)=(\S*) (\w+)=(\S*)"
);

{"person=Fred id=77 friend= favorite=tea", "person", "Fred", "id", "77", "friend", "", "favorite", "tea"}

The \w+ matches one or more word characters. The \S* matches zero or more characters that are not spaces. In the resulting JSL list, the field names (person, id, friend, favorite) and their corresponding values (Fred, 77, "", tea) are separate strings.

If the first argument to Regex Match() is a variable and a third argument specifies the replacement value, the matched text is replaced in the variable.

Comparing Regex and Regex Match

Regex() and Regex Match() match a pattern in a given string but return different results. To transforms your string into another string, use Regex(). To identify the substrings that match specific parts of the pattern, use Regex Match().

This example shows the efficiency of Regex Match() compared to Regex(). The source is a list of six strings. The goal is to extract portions of those six strings into the subject, verb, and object columns of a data table.

Figure 6.4 Final Data Table 

source = {"the cat ate the chicken", "the dog chased the cat", "did ralph like mary", "the girl pets the dog",
"these words are strange", "the cat was chased by the dog"};
 
dt = New Table( "English 101", // create the data table
	New Column( "subject", character ),
	New Column( "verb", character ),
	New Column( "object", character )
);

// iterate through the strings in the list

For( i = 1, i <= N Items( source ), i++,
 

// assign the result of each match to matchList

	matchList = Regex Match(
		source[i],
 

/* scan each string, match zero or more characters

and one item in each group */

".*?(cat|dog|ralph|girl).*?(ate|chased|like|pets).*?(chicken|cat|mary|dog)"

	);
 

/* If matchList has zero items (string 5), don’t add a row

to the table. Put each matched string in separate

data table cells. */

	If( N Items( matchList ) > 0,
		dt << Add Rows( 1 );
		dt:subject = matchList[2]; // match the first open parenthesis
		dt:verb = matchList[3]; // match the second open parenthesis
		dt:object = matchList[4]; // match the third open parenthesis
	);
);

Regex Match() returns {"the cat was chased by the dog", "cat", "chased", "dog"} in a single try with each answer in a separate string. Compare this example to a similar one using Regex(), which returns one answer at a time and builds the final string using backreferences.

For( i = 1, i <= N Items( source ), i++,
	s = Regex( source[i], ".*?(cat|dog|ralph|girl).*?(ate|chased|like|pets).*?(chicken|cat|mary|dog)", "\1" ); // match an item in the first group
	v = Regex( source[i], ".*?(cat|dog|ralph|girl).*?(ate|chased|like|pets).*?(chicken|cat|mary|dog)", "\2" ); // match an item in the second group
	o = Regex( source[i], ".*?(cat|dog|ralph|girl).*?(ate|chased|like|pets).*?(chicken|cat|mary|dog)", "\3" ); // match an item in the third group
	If( !Is Missing( s ) & !Is Missing( v ) & !Is Missing( o ),
		dt << Add Rows( 1 );
		dt:subject = s; // return the match for \1
		dt:verb = v; // return the match for \2
		dt:object = o; // return the match for \3
	);
);

Backreferences are discussed in Backreferences and Capturing Groups.

Want more information? Have questions? Get answers in the JMP User Community.