{"id":511,"date":"2025-04-23T16:24:31","date_gmt":"2025-04-23T16:24:31","guid":{"rendered":"https:\/\/snowflake.pavlik.us\/?p=511"},"modified":"2025-04-25T17:43:59","modified_gmt":"2025-04-25T17:43:59","slug":"quick-sample-of-fuzzy-matching-in-snowflake","status":"publish","type":"post","link":"https:\/\/snowflake.pavlik.us\/index.php\/2025\/04\/23\/quick-sample-of-fuzzy-matching-in-snowflake\/","title":{"rendered":"Quick Sample of Fuzzy Matching in Snowflake"},"content":{"rendered":"\n<p><strong>Quick Sample of Fuzzy Matching in Snowflake<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Introduction<\/h3>\n\n\n\n<p>This post walks through a quick, practical example of fuzzy name matching using Snowflake SQL. The goal is to identify approximate matches based on phonetic similarity and spelling distance. We&#8217;ll progressively build up a simple pattern using <code>SOUNDEX<\/code> for fast phonetic filtering and <code>EDITDISTANCE<\/code> for final scoring. This isn&#8217;t a production-ready pipeline\u2014it\u2019s a conceptual starting point. A more advanced post will follow with a normalized nickname mapping approach.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Input Parameters<\/h3>\n\n\n\n<p>We begin by setting the target name to match against. These could be passed in dynamically or used in ad hoc analysis:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSET FIRST_NAME = &#039;Greg&#039;;\nSET LAST_NAME  = &#039;Smith&#039;;\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">Step 2: Sample Data<\/h3>\n\n\n\n<p>Next, we define a small set of names with intentional variations\u2014nicknames, spelling shifts, and common soundalikes. This is your test data:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nWITH NAMES AS (\n    SELECT COLUMN1 AS FIRST_NAME, COLUMN2 AS LAST_NAME FROM (\n        VALUES \n            (&#039;Greg&#039;, &#039;Smith&#039;),\n            (&#039;Gray&#039;, &#039;Smith&#039;),\n            (&#039;Greg&#039;, &#039;Smyth&#039;),\n            (&#039;Craig&#039;, &#039;Smythe&#039;),\n            (&#039;Gregory&#039;, &#039;Smithe&#039;),\n            (&#039;Mike&#039;, &#039;Smith&#039;),\n            (&#039;Gregg&#039;, &#039;Smith&#039;),\n            (&#039;Gregg&#039;, &#039;Smithe&#039;)\n    )\n),\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">Step 3: Phonetic Projection with SOUNDEX<\/h3>\n\n\n\n<p>We calculate the phonetic representation of first and last names using <code>SOUNDEX<\/code>. This lets us filter out clearly unrelated candidates before calculating edit distance:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nSOUNDEX_PROJECTION AS (\n    SELECT   FIRST_NAME,\n             LAST_NAME,\n             SOUNDEX(FIRST_NAME) AS SOUNDEX_FIRST,\n             SOUNDEX(LAST_NAME)  AS SOUNDEX_LAST\n    FROM NAMES\n)\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">Step 4: Match by Edit Distance<\/h3>\n\n\n\n<p>Finally, we compare names that share a soundex prefix, ranking them by <code>EDITDISTANCE<\/code> from the target name:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nSELECT    FIRST_NAME,\n          LAST_NAME,\n          EDITDISTANCE(CONCAT(FIRST_NAME, &#039; &#039;, LAST_NAME),\n                       CONCAT($FIRST_NAME, &#039; &#039;, $LAST_NAME)) AS DISTANCE\nFROM      SOUNDEX_PROJECTION\nWHERE     SOUNDEX_FIRST = SOUNDEX($FIRST_NAME)\n      AND SOUNDEX_LAST  = SOUNDEX($LAST_NAME)\n      AND EDITDISTANCE(CONCAT(FIRST_NAME, &#039; &#039;, LAST_NAME),\n                       CONCAT($FIRST_NAME, &#039; &#039;, $LAST_NAME)) &amp;lt;= 10\nORDER BY  DISTANCE ASC;\n\n<\/pre><\/div>\n\n\n<p>This query filters and scores results, favoring names with both phonetic and lexical similarity. You can tune the distance threshold based on your precision-recall tradeoff.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Final: Minimal Reproducible Example<\/h3>\n\n\n\n<p>Here\u2019s the entire working example in one block for easy copy-paste and experimentation:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSET FIRST_NAME = &#039;Greg&#039;;\nSET LAST_NAME  = &#039;Smith&#039;;\n\nWITH NAMES AS (\n    SELECT COLUMN1 AS FIRST_NAME, COLUMN2 AS LAST_NAME FROM (\n        VALUES \n            (&#039;Greg&#039;, &#039;Smith&#039;),\n            (&#039;Gray&#039;, &#039;Smith&#039;),\n            (&#039;Greg&#039;, &#039;Smyth&#039;),\n            (&#039;Craig&#039;, &#039;Smythe&#039;),\n            (&#039;Gregory&#039;, &#039;Smithe&#039;),\n            (&#039;Mike&#039;, &#039;Smith&#039;),\n            (&#039;Gregg&#039;, &#039;Smith&#039;),\n            (&#039;Gregg&#039;, &#039;Smithe&#039;)\n    )\n),\nSOUNDEX_PROJECTION AS (\n    SELECT   FIRST_NAME,\n             LAST_NAME,\n             SOUNDEX(FIRST_NAME) AS SOUNDEX_FIRST,\n             SOUNDEX(LAST_NAME)  AS SOUNDEX_LAST\n    FROM NAMES\n)\nSELECT    FIRST_NAME,\n          LAST_NAME,\n          EDITDISTANCE(CONCAT(FIRST_NAME, &#039; &#039;, LAST_NAME),\n                       CONCAT($FIRST_NAME, &#039; &#039;, $LAST_NAME)) AS DISTANCE\nFROM      SOUNDEX_PROJECTION\nWHERE     SOUNDEX_FIRST = SOUNDEX($FIRST_NAME)\n      AND SOUNDEX_LAST  = SOUNDEX($LAST_NAME)\n      AND EDITDISTANCE(CONCAT(FIRST_NAME, &#039; &#039;, LAST_NAME),\n                       CONCAT($FIRST_NAME, &#039; &#039;, $LAST_NAME)) &lt;= 10\nORDER BY  DISTANCE ASC;\n\n<\/pre><\/div>\n\n\n<p>A follow-up article will extend this logic using a normalized mapping table (e.g., mapping <code>Catherine<\/code> to <code>Cat<\/code>, <code>Katie<\/code>, etc.) for formal\/informal name handling.<\/p>\n\n\n\n<p>Stay tuned.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Quick Sample of Fuzzy Matching in Snowflake Introduction This post walks through a quick, practical example of fuzzy name matching using Snowflake SQL. The goal is to identify approximate matches based on phonetic similarity and spelling distance. We&#8217;ll progressively build up a simple pattern using SOUNDEX for fast phonetic filtering and EDITDISTANCE for final scoring. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[43],"tags":[],"class_list":["post-511","post","type-post","status-publish","format-standard","hentry","category-fuzzy-matching"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\r\n<title>Quick Sample of Fuzzy Matching in Snowflake - Snowflake in the Carolinas<\/title>\r\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\r\n<link rel=\"canonical\" href=\"https:\/\/snowflake.pavlik.us\/index.php\/2025\/04\/23\/quick-sample-of-fuzzy-matching-in-snowflake\/\" \/>\r\n<meta property=\"og:locale\" content=\"en_US\" \/>\r\n<meta property=\"og:type\" content=\"article\" \/>\r\n<meta property=\"og:title\" content=\"Quick Sample of Fuzzy Matching in Snowflake - Snowflake in the Carolinas\" \/>\r\n<meta property=\"og:description\" content=\"Quick Sample of Fuzzy Matching in Snowflake Introduction This post walks through a quick, practical example of fuzzy name matching using Snowflake SQL. The goal is to identify approximate matches based on phonetic similarity and spelling distance. We&#8217;ll progressively build up a simple pattern using SOUNDEX for fast phonetic filtering and EDITDISTANCE for final scoring. [&hellip;]\" \/>\r\n<meta property=\"og:url\" content=\"https:\/\/snowflake.pavlik.us\/index.php\/2025\/04\/23\/quick-sample-of-fuzzy-matching-in-snowflake\/\" \/>\r\n<meta property=\"og:site_name\" content=\"Snowflake in the Carolinas\" \/>\r\n<meta property=\"article:published_time\" content=\"2025-04-23T16:24:31+00:00\" \/>\r\n<meta property=\"article:modified_time\" content=\"2025-04-25T17:43:59+00:00\" \/>\r\n<meta name=\"author\" content=\"Greg Pavlik\" \/>\r\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\r\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Greg Pavlik\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\r\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2025\\\/04\\\/23\\\/quick-sample-of-fuzzy-matching-in-snowflake\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2025\\\/04\\\/23\\\/quick-sample-of-fuzzy-matching-in-snowflake\\\/\"},\"author\":{\"name\":\"Greg Pavlik\",\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/#\\\/schema\\\/person\\\/019455f4675665b6cf5edea31ec44d7b\"},\"headline\":\"Quick Sample of Fuzzy Matching in Snowflake\",\"datePublished\":\"2025-04-23T16:24:31+00:00\",\"dateModified\":\"2025-04-25T17:43:59+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2025\\\/04\\\/23\\\/quick-sample-of-fuzzy-matching-in-snowflake\\\/\"},\"wordCount\":251,\"commentCount\":0,\"articleSection\":[\"Fuzzy Matching\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2025\\\/04\\\/23\\\/quick-sample-of-fuzzy-matching-in-snowflake\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2025\\\/04\\\/23\\\/quick-sample-of-fuzzy-matching-in-snowflake\\\/\",\"url\":\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2025\\\/04\\\/23\\\/quick-sample-of-fuzzy-matching-in-snowflake\\\/\",\"name\":\"Quick Sample of Fuzzy Matching in Snowflake - Snowflake in the Carolinas\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/#website\"},\"datePublished\":\"2025-04-23T16:24:31+00:00\",\"dateModified\":\"2025-04-25T17:43:59+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/#\\\/schema\\\/person\\\/019455f4675665b6cf5edea31ec44d7b\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2025\\\/04\\\/23\\\/quick-sample-of-fuzzy-matching-in-snowflake\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2025\\\/04\\\/23\\\/quick-sample-of-fuzzy-matching-in-snowflake\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2025\\\/04\\\/23\\\/quick-sample-of-fuzzy-matching-in-snowflake\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/snowflake.pavlik.us\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Quick Sample of Fuzzy Matching in Snowflake\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/#website\",\"url\":\"https:\\\/\\\/snowflake.pavlik.us\\\/\",\"name\":\"Snowflake in the Carolinas\",\"description\":\"Random thoughts on all things Snowflake in the Carolinas\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/snowflake.pavlik.us\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/#\\\/schema\\\/person\\\/019455f4675665b6cf5edea31ec44d7b\",\"name\":\"Greg Pavlik\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/d81df729eebf37a042922b17d4a4c834b1e0ccfa9fea1c2c78cb8e95c7e91701?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/d81df729eebf37a042922b17d4a4c834b1e0ccfa9fea1c2c78cb8e95c7e91701?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/d81df729eebf37a042922b17d4a4c834b1e0ccfa9fea1c2c78cb8e95c7e91701?s=96&d=mm&r=g\",\"caption\":\"Greg Pavlik\"},\"description\":\"Greg is a Senior Sales Engineer at Snowflake Computing, in the Raleigh-Durham area. He's been in data management and security for the twenty years.\"}]}<\/script>\r\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Quick Sample of Fuzzy Matching in Snowflake - Snowflake in the Carolinas","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/snowflake.pavlik.us\/index.php\/2025\/04\/23\/quick-sample-of-fuzzy-matching-in-snowflake\/","og_locale":"en_US","og_type":"article","og_title":"Quick Sample of Fuzzy Matching in Snowflake - Snowflake in the Carolinas","og_description":"Quick Sample of Fuzzy Matching in Snowflake Introduction This post walks through a quick, practical example of fuzzy name matching using Snowflake SQL. The goal is to identify approximate matches based on phonetic similarity and spelling distance. We&#8217;ll progressively build up a simple pattern using SOUNDEX for fast phonetic filtering and EDITDISTANCE for final scoring. [&hellip;]","og_url":"https:\/\/snowflake.pavlik.us\/index.php\/2025\/04\/23\/quick-sample-of-fuzzy-matching-in-snowflake\/","og_site_name":"Snowflake in the Carolinas","article_published_time":"2025-04-23T16:24:31+00:00","article_modified_time":"2025-04-25T17:43:59+00:00","author":"Greg Pavlik","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Greg Pavlik","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/snowflake.pavlik.us\/index.php\/2025\/04\/23\/quick-sample-of-fuzzy-matching-in-snowflake\/#article","isPartOf":{"@id":"https:\/\/snowflake.pavlik.us\/index.php\/2025\/04\/23\/quick-sample-of-fuzzy-matching-in-snowflake\/"},"author":{"name":"Greg Pavlik","@id":"https:\/\/snowflake.pavlik.us\/#\/schema\/person\/019455f4675665b6cf5edea31ec44d7b"},"headline":"Quick Sample of Fuzzy Matching in Snowflake","datePublished":"2025-04-23T16:24:31+00:00","dateModified":"2025-04-25T17:43:59+00:00","mainEntityOfPage":{"@id":"https:\/\/snowflake.pavlik.us\/index.php\/2025\/04\/23\/quick-sample-of-fuzzy-matching-in-snowflake\/"},"wordCount":251,"commentCount":0,"articleSection":["Fuzzy Matching"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/snowflake.pavlik.us\/index.php\/2025\/04\/23\/quick-sample-of-fuzzy-matching-in-snowflake\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/snowflake.pavlik.us\/index.php\/2025\/04\/23\/quick-sample-of-fuzzy-matching-in-snowflake\/","url":"https:\/\/snowflake.pavlik.us\/index.php\/2025\/04\/23\/quick-sample-of-fuzzy-matching-in-snowflake\/","name":"Quick Sample of Fuzzy Matching in Snowflake - Snowflake in the Carolinas","isPartOf":{"@id":"https:\/\/snowflake.pavlik.us\/#website"},"datePublished":"2025-04-23T16:24:31+00:00","dateModified":"2025-04-25T17:43:59+00:00","author":{"@id":"https:\/\/snowflake.pavlik.us\/#\/schema\/person\/019455f4675665b6cf5edea31ec44d7b"},"breadcrumb":{"@id":"https:\/\/snowflake.pavlik.us\/index.php\/2025\/04\/23\/quick-sample-of-fuzzy-matching-in-snowflake\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/snowflake.pavlik.us\/index.php\/2025\/04\/23\/quick-sample-of-fuzzy-matching-in-snowflake\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/snowflake.pavlik.us\/index.php\/2025\/04\/23\/quick-sample-of-fuzzy-matching-in-snowflake\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/snowflake.pavlik.us\/"},{"@type":"ListItem","position":2,"name":"Quick Sample of Fuzzy Matching in Snowflake"}]},{"@type":"WebSite","@id":"https:\/\/snowflake.pavlik.us\/#website","url":"https:\/\/snowflake.pavlik.us\/","name":"Snowflake in the Carolinas","description":"Random thoughts on all things Snowflake in the Carolinas","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/snowflake.pavlik.us\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/snowflake.pavlik.us\/#\/schema\/person\/019455f4675665b6cf5edea31ec44d7b","name":"Greg Pavlik","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/d81df729eebf37a042922b17d4a4c834b1e0ccfa9fea1c2c78cb8e95c7e91701?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/d81df729eebf37a042922b17d4a4c834b1e0ccfa9fea1c2c78cb8e95c7e91701?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d81df729eebf37a042922b17d4a4c834b1e0ccfa9fea1c2c78cb8e95c7e91701?s=96&d=mm&r=g","caption":"Greg Pavlik"},"description":"Greg is a Senior Sales Engineer at Snowflake Computing, in the Raleigh-Durham area. He's been in data management and security for the twenty years."}]}},"_links":{"self":[{"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/posts\/511","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/comments?post=511"}],"version-history":[{"count":11,"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/posts\/511\/revisions"}],"predecessor-version":[{"id":529,"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/posts\/511\/revisions\/529"}],"wp:attachment":[{"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/media?parent=511"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/categories?post=511"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/tags?post=511"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}