Regex Non-Capturing Groups and Lookarounds in Snowflake
If you don’t need the background or discussion of how they work and just want to download Snowflake UDFs that support regex non-capturing groups, lookaheads, and lookbehinds, you can download them here:
https://github.com/GregPavlik/SnowflakeUDFs/tree/main/RegularExpressions
Now for the background:
Snowflake supports regular expressions (regex) for string matching and replacements. If your regex skills are like mine, Snowflake’s regex implementation provides more than you’ll ever need.
For regex ninjas and people who want to use regular expression libraries, there are two commonly-used capabilities that this post explains Snowflake’s regex functions do not currently support: non-capturing groups and lookarounds.
Every once in a while I run into a customer who’s a regex ninja or wants to use a regex from a library that requires one of these capabilities.
It occurred to me that JavaScript supports regex with these features, and Snowflake supports JavaScript user defined functions (UDFs). To use a regex in Snowflake that has non-capturing groups or lookarounds, It’s a simple matter of writing a UDF.
The problem is writing a new UDF for each use of a regex reduces some of the main advantages of regular expressions including compactness and simplicity.
This lead me to write two general-purpose UDFs that approximate Snowflake’s REGEXP_REPLACE and RLIKE (synonym REGEXP_LIKE) as closely as possible while enabling non-capturing groups and lookarounds.
I named the JavaScript UDFs similar to the Snowflake functions they approximate, REGEXP_REPLACE2 and RLIKE2 (synonym REGEXP_LIKE2). I also overloaded the UDFs so that you can call them using minimal parameters or optional parameters the same as their base Snowflake functions.
Here’s an example of their usage:
-- Running the base function returns this error:
-- Invalid regular expression: 'bar(?=bar)', no argument for repetition operator: ?
select regexp_replace('foobarbarfoo', 'bar(?=bar)', '***');
-- Running the UDF approximating the base function returns foo***barfoo
select regexp_replace2('foobarbarfoo', 'bar(?=bar)', '***');
-- Running the base function returns this error:
-- Invalid regular expression: 'bar(?=bar)', no argument for repetition operator: ?
select rlike('foobarbarfoo', 'bar(?=bar)');
-- Running the UDF approximating the base function returns TRUE
select rlike2('foobarbarfoo', 'bar(?=bar)');
You can download the UDFs on my Github here: https://github.com/GregPavlik/SnowflakeUDFs/tree/main/RegularExpressions