{"id":48,"date":"2019-05-30T21:56:29","date_gmt":"2019-05-30T21:56:29","guid":{"rendered":"https:\/\/snowflake.pavlik.us\/?p=48"},"modified":"2019-09-10T16:48:39","modified_gmt":"2019-09-10T16:48:39","slug":"geolocation-of-ip-addresses-in-snowflake","status":"publish","type":"post","link":"https:\/\/snowflake.pavlik.us\/index.php\/2019\/05\/30\/geolocation-of-ip-addresses-in-snowflake\/","title":{"rendered":"Geolocation of IP Addresses in Snowflake"},"content":{"rendered":"\n<p>It&#8217;s probably a safe assumption that business intelligence and data science teams can get valuable insights knowing the geolocation of website visitors. Suppose a product launch gets lots of web traffic, but the only source of information on visitors is the web log. Some web server statistics report on traffic grouped by nation, but what if we want to get much more granular information and incorporate this information with the main data warehouse?<\/p>\n\n\n\n<p>Let&#8217;s take one sample web server among many, Apache Web Server, and quickly examine the structure of a log entry. Here&#8217;s a line in a sample Apache Web Server log.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n64.242.88.10 - - &#x5B;07\/Mar\/2004:16:05:49 -0800] &quot;GET \/twiki\/bin\/edit\/Main\/Double_bounce_sender?topicparent=Main.ConfigurationVariables HTTP\/1.1&quot; 401 12846\n<\/pre><\/div>\n\n\n<p>In the Apache documentation, we can get more detailed information on the meaning of each field in the line, but for now we&#8217;re going to concentrate on 1) how to load the web logs into Snowflake, and 2) the key aspects for business intelligence and geolocation.<\/p>\n\n\n\n<p>Loading the data is a quick proposition. Even without reading the Apache documentation it&#8217;s clear that the web log is space delimited and wraps any fields with spaces inside double quotes. Snowflake provides a very simple way to ingest structured data in flat files using File Formats. You can create a file format using SnowSQL (documented here: <a href=\"https:\/\/docs.snowflake.net\/manuals\/sql-reference\/sql\/create-file-format.html\">https:\/\/docs.snowflake.net\/manuals\/sql-reference\/sql\/create-file-format.html<\/a>) or you can use the Snowflake Web UI (documented here: <a href=\"https:\/\/docs.snowflake.net\/manuals\/user-guide\/data-load-web-ui.html#step-1-open-the-load-data-wizard\">https:\/\/docs.snowflake.net\/manuals\/user-guide\/data-load-web-ui.html#step-1-open-the-load-data-wizard<\/a>).<\/p>\n\n\n\n<p>Note: Although the Apache Web Log is space delimited, we will use the CSV option for the Snowflake File Format &#8212; simply change the delimiter from a comma to a space.<\/p>\n\n\n\n<p>After loading the data, we now need to geolocate the web hits. Problem: the Apache Web Server log, as most web logs, does not show anything for geolocation. Fortunately in most cases, we can use the visitor&#8217;s IP address to get fairly accurate geolocation.<\/p>\n\n\n\n<p>Third party services keep up to date databases of IPv4 and IPv5 geolocation data. Once such service I found at <a href=\"https:\/\/lite.ip2location.com\">https:\/\/lite.ip2location.com<\/a> includes free databases with less rich information than the paid versions. In my testing I found the free databases accurate and useful, though production BI or data science work should consider the paid versions.<\/p>\n\n\n\n<p>The next question is how to resolve millions or billions of web log lines&#8217; IP address  to approximate geolocation. This is where Snowflake shines. The IP2Location LITE comes as a flat structured file with millions of rows  After creating another Snowflake File Format, it&#8217;s an easy matter to turn the IP2Location flat file into a Snowflake table. From there, Snowflake&#8217;s powerful massive-scale join make it a simple matter to create a joined view that shows the IP&#8217;s approximate location.<\/p>\n\n\n\n<p>There&#8217;s one final issue to get this all working. The IP2Location data comes with IPs represented by 32-bit integers, not the familiar dotted quad notation. This makes it much easier to use code and database routines that search for ranges of IPs that all happen to be in the same area by specifying a lower and upper range for the IP number.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nIP Dotted Quad:     IP 32-bit Integer, Decimal\n192.168.1.1         3232235777\n<\/pre><\/div>\n\n\n<p>This means we need to convert our dotted quad values into 32-bit integer values. Fortunately, Snowflake makes that easy with a UDF (User Defined Function):<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: jscript; title: ; notranslate\" title=\"\">\n--Convert an IPv4 Dotted Quad into an IP Number\ncreate or replace function IpToInteger(ipAddress varchar)\n  returns double \n  language javascript\n  strict\n  as &#039;\n    \n    var ipQuad = IPADDRESS.split(&quot;.&quot;);\n \n    var quad1 = parseInt(ipQuad&#x5B;0]);\n    var quad2 = parseInt(ipQuad&#x5B;1]);\n    var quad3 = parseInt(ipQuad&#x5B;2]);\n    var quad4 = parseInt(ipQuad&#x5B;3]);\n    return (quad1 * 16777216) + (quad2 * 65536) + (quad3 * 256) + quad4;\n\n  &#039;;\n<\/pre><\/div>\n\n\n<p>You may have noticed that the function returns a double, when we know the return will be a 32-bit integer. The reason is that all numeric values in Javascript are floating point, and fortunately we won&#8217;t be encountering any issues with precision (i.e. 0.9999999999 != 1) since we&#8217;re dealing with only multiplication and addition.<\/p>\n\n\n\n<p>This is one possible approach, but running the Javascript millions or billions of times to get the value concerns me from a performance perspective, so in my next post I&#8217;ll explore the possibility of using straight SnowSQL to convert a dotted quad into a 32-bit integer.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s probably a safe assumption that business intelligence and data science teams can get valuable insights knowing the geolocation of website visitors. Suppose a product launch gets lots of web traffic, but the only source of information on visitors is the web log. Some web server statistics report on traffic grouped by nation, but what [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,7],"tags":[],"class_list":["post-48","post","type-post","status-publish","format-standard","hentry","category-sql","category-udf-sql"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\r\n<title>Geolocation of IP Addresses in Snowflake - Snowflake in the Carolinas<\/title>\r\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\r\n<link rel=\"canonical\" href=\"https:\/\/snowflake.pavlik.us\/index.php\/2019\/05\/30\/geolocation-of-ip-addresses-in-snowflake\/\" \/>\r\n<meta property=\"og:locale\" content=\"en_US\" \/>\r\n<meta property=\"og:type\" content=\"article\" \/>\r\n<meta property=\"og:title\" content=\"Geolocation of IP Addresses in Snowflake - Snowflake in the Carolinas\" \/>\r\n<meta property=\"og:description\" content=\"It&#8217;s probably a safe assumption that business intelligence and data science teams can get valuable insights knowing the geolocation of website visitors. Suppose a product launch gets lots of web traffic, but the only source of information on visitors is the web log. Some web server statistics report on traffic grouped by nation, but what [&hellip;]\" \/>\r\n<meta property=\"og:url\" content=\"https:\/\/snowflake.pavlik.us\/index.php\/2019\/05\/30\/geolocation-of-ip-addresses-in-snowflake\/\" \/>\r\n<meta property=\"og:site_name\" content=\"Snowflake in the Carolinas\" \/>\r\n<meta property=\"article:published_time\" content=\"2019-05-30T21:56:29+00:00\" \/>\r\n<meta property=\"article:modified_time\" content=\"2019-09-10T16:48:39+00:00\" \/>\r\n<meta name=\"author\" content=\"Greg Pavlik\" \/>\r\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\r\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Greg Pavlik\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\r\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2019\\\/05\\\/30\\\/geolocation-of-ip-addresses-in-snowflake\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2019\\\/05\\\/30\\\/geolocation-of-ip-addresses-in-snowflake\\\/\"},\"author\":{\"name\":\"Greg Pavlik\",\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/#\\\/schema\\\/person\\\/019455f4675665b6cf5edea31ec44d7b\"},\"headline\":\"Geolocation of IP Addresses in Snowflake\",\"datePublished\":\"2019-05-30T21:56:29+00:00\",\"dateModified\":\"2019-09-10T16:48:39+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2019\\\/05\\\/30\\\/geolocation-of-ip-addresses-in-snowflake\\\/\"},\"wordCount\":653,\"commentCount\":0,\"articleSection\":[\"SnowSQL\",\"UDF\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2019\\\/05\\\/30\\\/geolocation-of-ip-addresses-in-snowflake\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2019\\\/05\\\/30\\\/geolocation-of-ip-addresses-in-snowflake\\\/\",\"url\":\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2019\\\/05\\\/30\\\/geolocation-of-ip-addresses-in-snowflake\\\/\",\"name\":\"Geolocation of IP Addresses in Snowflake - Snowflake in the Carolinas\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/#website\"},\"datePublished\":\"2019-05-30T21:56:29+00:00\",\"dateModified\":\"2019-09-10T16:48:39+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/#\\\/schema\\\/person\\\/019455f4675665b6cf5edea31ec44d7b\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2019\\\/05\\\/30\\\/geolocation-of-ip-addresses-in-snowflake\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2019\\\/05\\\/30\\\/geolocation-of-ip-addresses-in-snowflake\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/index.php\\\/2019\\\/05\\\/30\\\/geolocation-of-ip-addresses-in-snowflake\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/snowflake.pavlik.us\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Geolocation of IP Addresses in Snowflake\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/#website\",\"url\":\"https:\\\/\\\/snowflake.pavlik.us\\\/\",\"name\":\"Snowflake in the Carolinas\",\"description\":\"Random thoughts on all things Snowflake in the Carolinas\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/snowflake.pavlik.us\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/snowflake.pavlik.us\\\/#\\\/schema\\\/person\\\/019455f4675665b6cf5edea31ec44d7b\",\"name\":\"Greg Pavlik\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/d81df729eebf37a042922b17d4a4c834b1e0ccfa9fea1c2c78cb8e95c7e91701?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/d81df729eebf37a042922b17d4a4c834b1e0ccfa9fea1c2c78cb8e95c7e91701?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/d81df729eebf37a042922b17d4a4c834b1e0ccfa9fea1c2c78cb8e95c7e91701?s=96&d=mm&r=g\",\"caption\":\"Greg Pavlik\"},\"description\":\"Greg is a Senior Sales Engineer at Snowflake Computing, in the Raleigh-Durham area. He's been in data management and security for the twenty years.\"}]}<\/script>\r\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Geolocation of IP Addresses in Snowflake - Snowflake in the Carolinas","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/snowflake.pavlik.us\/index.php\/2019\/05\/30\/geolocation-of-ip-addresses-in-snowflake\/","og_locale":"en_US","og_type":"article","og_title":"Geolocation of IP Addresses in Snowflake - Snowflake in the Carolinas","og_description":"It&#8217;s probably a safe assumption that business intelligence and data science teams can get valuable insights knowing the geolocation of website visitors. Suppose a product launch gets lots of web traffic, but the only source of information on visitors is the web log. Some web server statistics report on traffic grouped by nation, but what [&hellip;]","og_url":"https:\/\/snowflake.pavlik.us\/index.php\/2019\/05\/30\/geolocation-of-ip-addresses-in-snowflake\/","og_site_name":"Snowflake in the Carolinas","article_published_time":"2019-05-30T21:56:29+00:00","article_modified_time":"2019-09-10T16:48:39+00:00","author":"Greg Pavlik","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Greg Pavlik","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/snowflake.pavlik.us\/index.php\/2019\/05\/30\/geolocation-of-ip-addresses-in-snowflake\/#article","isPartOf":{"@id":"https:\/\/snowflake.pavlik.us\/index.php\/2019\/05\/30\/geolocation-of-ip-addresses-in-snowflake\/"},"author":{"name":"Greg Pavlik","@id":"https:\/\/snowflake.pavlik.us\/#\/schema\/person\/019455f4675665b6cf5edea31ec44d7b"},"headline":"Geolocation of IP Addresses in Snowflake","datePublished":"2019-05-30T21:56:29+00:00","dateModified":"2019-09-10T16:48:39+00:00","mainEntityOfPage":{"@id":"https:\/\/snowflake.pavlik.us\/index.php\/2019\/05\/30\/geolocation-of-ip-addresses-in-snowflake\/"},"wordCount":653,"commentCount":0,"articleSection":["SnowSQL","UDF"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/snowflake.pavlik.us\/index.php\/2019\/05\/30\/geolocation-of-ip-addresses-in-snowflake\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/snowflake.pavlik.us\/index.php\/2019\/05\/30\/geolocation-of-ip-addresses-in-snowflake\/","url":"https:\/\/snowflake.pavlik.us\/index.php\/2019\/05\/30\/geolocation-of-ip-addresses-in-snowflake\/","name":"Geolocation of IP Addresses in Snowflake - Snowflake in the Carolinas","isPartOf":{"@id":"https:\/\/snowflake.pavlik.us\/#website"},"datePublished":"2019-05-30T21:56:29+00:00","dateModified":"2019-09-10T16:48:39+00:00","author":{"@id":"https:\/\/snowflake.pavlik.us\/#\/schema\/person\/019455f4675665b6cf5edea31ec44d7b"},"breadcrumb":{"@id":"https:\/\/snowflake.pavlik.us\/index.php\/2019\/05\/30\/geolocation-of-ip-addresses-in-snowflake\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/snowflake.pavlik.us\/index.php\/2019\/05\/30\/geolocation-of-ip-addresses-in-snowflake\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/snowflake.pavlik.us\/index.php\/2019\/05\/30\/geolocation-of-ip-addresses-in-snowflake\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/snowflake.pavlik.us\/"},{"@type":"ListItem","position":2,"name":"Geolocation of IP Addresses in Snowflake"}]},{"@type":"WebSite","@id":"https:\/\/snowflake.pavlik.us\/#website","url":"https:\/\/snowflake.pavlik.us\/","name":"Snowflake in the Carolinas","description":"Random thoughts on all things Snowflake in the Carolinas","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/snowflake.pavlik.us\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/snowflake.pavlik.us\/#\/schema\/person\/019455f4675665b6cf5edea31ec44d7b","name":"Greg Pavlik","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/d81df729eebf37a042922b17d4a4c834b1e0ccfa9fea1c2c78cb8e95c7e91701?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/d81df729eebf37a042922b17d4a4c834b1e0ccfa9fea1c2c78cb8e95c7e91701?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d81df729eebf37a042922b17d4a4c834b1e0ccfa9fea1c2c78cb8e95c7e91701?s=96&d=mm&r=g","caption":"Greg Pavlik"},"description":"Greg is a Senior Sales Engineer at Snowflake Computing, in the Raleigh-Durham area. He's been in data management and security for the twenty years."}]}},"_links":{"self":[{"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/posts\/48","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/comments?post=48"}],"version-history":[{"count":5,"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/posts\/48\/revisions"}],"predecessor-version":[{"id":53,"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/posts\/48\/revisions\/53"}],"wp:attachment":[{"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/media?parent=48"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/categories?post=48"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/snowflake.pavlik.us\/index.php\/wp-json\/wp\/v2\/tags?post=48"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}