Regular Expressions in Proc SQL
Among the functions you can use in Proc SQL is PRXMATCH. At a single stroke, this increases the power of the SELECT statement quite dramatically. Now you can select records that match regular expressions.
This first example is looking for product names that contain words beginning with "h" and end with "r". For matching records, PRXMATCH will return the position of the substring that matched the regular expression; if there was no match, it will return 0.
A quick refresher on regular expressions:
- "!" is being used here as a delimiter. "/" is also commonly used in this way.
- "b" matches a "word boundary".
- "[Hh]" is a character class that matches either "H" or "h".
- "." matches any single character, and "*" is a repeat count meaning "0 or more times"
Among the matches found by the above query are:
Large Hover Mower
Easy Patio Heater
Hand Cultivator (Wood)
The last of these is one that was not wanted - there is a word boundary between the "h" and the "r". We could get rid of this in a number of ways, for example by changing the regular expression to "!b[Hh]w*[rR]b!", where "w" matches any "word character". Word characters are defined as alphanumerics, plus the underscore character.
Here is another example using the same dataset, in which we look for cats and dogs:
Here the "|" means "or", and the "i" at the end of the regular expression makes it case-insensitive. Among the matches found are:
Kennel (Large Dog)
This time the regular expression said nothing about word boundaries.