Quick Sample of Fuzzy Matching in Snowflake

This is a very quick sample of using the SOUNDEX and EDITDISTANCE functions in Snowflake to do fuzzy matching. The SOUNDEX_PROJECTION portion of the CTE shows how to project and persist the results of the SOUNDEX function to a column for quick and efficient matching in a where clause. After the quick match, the projecting an EDITDISTANCE column shows how to stack rank the results from closest to furthest. The goal is not to have a final, working solution, but rather to show a quick sample. A more fully-featured solution would have a table mapping the SOUNDEX results for common names that are nicknames for formal names. For example, both “Greg” and “Gregg” have the same SOUNDEX values, but “Gregory” has a different SOUNDEX value. To make this simple approach more functional, a table linking all SOUNDEX values for derivations of “Greg” can help. Other examples may include looking for “Catherine” by matching all SOUNDEX values for “Katherine”, “Cat”, “Kate”, “Katie”, etc.

set FIRST_NAME = 'Greg';
set LAST_NAME = 'Smith';

with NAMES as 
(
    select COLUMN1 as FIRST_NAME, COLUMN2 as LAST_NAME from (values 
         ('Greg', 'Smith')
        ,('Gray', 'Smith')
        ,('Greg', 'Smyth')
        ,('Craig', 'Smythe')
        ,('Gregory', 'Smithe')
        ,('Mike', 'Smith')
        ,('Gregg', 'Smith')
        ,('Gregg', 'Smithe')
  )
), SOUNDEX_PROJECTION as
(
    select   FIRST_NAME
            ,LAST_NAME
            ,soundex(FIRST_NAME)    as SOUNDEX_FIRST
            ,soundex(LAST_NAME)     as SOUNDEX_LAST
    from NAMES
)
select    FIRST_NAME
         ,LAST_NAME
         ,editdistance(concat(FIRST_NAME, ' ', LAST_NAME), concat($FIRST_NAME, ' ', $LAST_NAME)) as DISTANCE
from     SOUNDEX_PROJECTION
where    SOUNDEX_FIRST = soundex($FIRST_NAME)
    and  SOUNDEX_LAST  = soundex($LAST_NAME)
    and  DISTANCE <= 10
order by DISTANCE asc
;