Category: SnowSQL

Multi-Table Inserts with Good and Bad Row Tables

Multi-Table Inserts with Good and Bad Row Tables

Many customers have asked me how to separate good rows from bad rows during a data load. You can use the validate table function to return all the errors encountered during the load. This may not be exactly what you need though.

What you may be looking for is a design pattern like this:

  • Load data from files into a raw table
  • Except for major errors, insert rows with minor data quality issues into the raw table
  • After loading the raw table, insert good rows to staging (if more processing to do) or production
  • At the same time, insert bad rows into a separate table for examination of data quality problems

You usually load Snowflake tables from files. Files are string data, so you can define all columns in your raw table as string type. This ensures simple errors will not disrupt the load process. Major errors such as an improper number of columns in a row will generate an error during the load. You can specify the appropriate copy option to set how you want Snowflake to handle major errors like this.

After defining a raw table, you can create a staging table or a production table. Either option uses proper data types instead of all strings. You’ll insert new rows to the target table while sending bad ones to a table containing the original bad values. You can then examine the bad rows to see why they failed to convert to the proper data types.

You can use the following SQL script as a template for how to use Snowflake multi-table Inserts to do this.

-- Create a staging table with all columns defined as strings.
-- This will hold all raw values from the load files.
create or replace table SALES_RAW
(                                       -- Actual Data Type
  SALE_TIMESTAMP            string,     -- timestamp
  ITEM_SKU                  string,     -- int
  PRICE                     string,     -- number(10,2)
  IS_TAXABLE                string,     -- boolean
  COMMENTS                  string      -- string
);

-- Create the production table with actual data types.
create or replace table SALES_STAGE
(
  SALE_TIMESTAMP            timestamp,
  ITEM_SKU                  int,
  PRICE                     number(10,2),
  IS_TAXABLE                boolean,
  COMMENTS                  string
);

-- Simulate adding some rows from a load file. Two rows are good.
-- Four rows generate errors when converting to the data types.
insert into SALES_RAW 
    (SALE_TIMESTAMP, ITEM_SKU, PRICE, IS_TAXABLE, COMMENTS) 
    values
    ('2020-03-17 18:21:34', '23289', '3.42',   'TRUE',  'Good row.'),
    ('2020-17-03 18:21:56', '91832', '1.41',   'FALSE', 'Bad row: SALE_TIMESTAMP has the month and day transposed.'),
    ('2020-03-17 18:22:03', '7O242', '2.99',   'T',     'Bad row: ITEM_SKU has a capital "O" instead of a zero.'),
    ('2020-03-17 18:22:10', '53921', '$6.25',  'F',     'Bad row: PRICE should not have a dollar sign.'),
    ('2020-03-17 18:22:17', '90210', '2.49',   'Foo',   'Bad row: IS_TAXABLE cannot be converted to true or false'),
    ('2020-03-17 18:22:24', '80386', '1.89',   '1',     'Good row.');

-- Make sure the rows inserted okay.
select * from SALES_RAW;

-- Create a table to hold the bad rows.
create or replace table SALES_BAD_ROWS like SALES_RAW;

-- Using multi-table inserts (https://docs.snowflake.net/manuals/sql-reference/sql/insert-multi-table.html)
-- Insert good rows into SALES_STAGE and bad rows into SALES_BAD_ROWS
insert  first
  when  SALE_TIMESTAMP_X is null and SALE_TIMESTAMP is not null or
        ITEM_SKU_X       is null and SALE_TIMESTAMP is not null or
        PRICE_X          is null and PRICE          is not null or
        IS_TAXABLE_X     is null and IS_TAXABLE     is not null
  then 
        into SALES_BAD_ROWS
            (SALE_TIMESTAMP, ITEM_SKU, PRICE, IS_TAXABLE, COMMENTS)
        values
            (SALE_TIMESTAMP, ITEM_SKU, PRICE, IS_TAXABLE, COMMENTS)  
  else 
        into SALES_STAGE 
            (SALE_TIMESTAMP, ITEM_SKU, PRICE, IS_TAXABLE, COMMENTS) 
         values 
            (SALE_TIMESTAMP_X, ITEM_SKU_X, PRICE_X, IS_TAXABLE_X, COMMENTS)
select  try_to_timestamp (SALE_TIMESTAMP)   as SALE_TIMESTAMP_X,
        try_to_number    (ITEM_SKU, 10, 0)  as ITEM_SKU_X,
        try_to_number    (PRICE, 10, 2)     as PRICE_X,
        try_to_boolean   (IS_TAXABLE)       as IS_TAXABLE_X,
                                               COMMENTS, 
                                               SALE_TIMESTAMP,
                                               ITEM_SKU,
                                               PRICE,
                                               IS_TAXABLE
from    SALES_RAW;

-- Examine the two good rows
select * from SALES_STAGE;

-- Examine the four bad rows
select * from SALES_BAD_ROWS;

If your incoming values have no nulls or a default value, you can eliminate the null check from the SQL. Now

  when  SALE_TIMESTAMP_X is null or
        ITEM_SKU_X       is null or
        PRICE_X          is null or
        IS_TAXABLE_X     is null 

This works because if the original value isn’t null, the only reason it would be null is type cast failure. There’s one final note on this section of the SQL. Why doesn’t the “when” section with several “and” and “or” operators need parenthesis?

The short answer is that AND has a higher operator precedence than the OR operator. This is true in SQL and most programming languages and seems familiar to many people. If it improves clarity you can add parenthesis. As an academic exercise, this shows the operator precedence of “and” and “or”.

select TRUE and TRUE or FALSE and FALSE; -- In operator precedence, AND comes first, then left to right. This evaluates to TRUE.

-- Processing AND first
select (TRUE and TRUE) or (FALSE and FALSE); -- This is functionally equivalent to the above statement.

-- Processing OR first
select TRUE and (TRUE or FALSE) and FALSE; -- This shows what would happen if OR had higher operator precedence

-- Processing only left to right
select ((TRUE and TRUE) or FALSE) and FALSE; -- This shows what would happen with no operator precedence, just left to right

Helper Functions in Snowflake Stored Procedures

Helper Functions in Snowflake Stored Procedures

Snowflake supports JavaScript stored procedures. You may choose to start by copying and modifying a sample Snowflake stored procedure from the documentation, often this one.

As you add more SQL statements, exception handling and increase code complexity, having all code in the main JavaScript function risks becoming spaghetti code.

Fortunately, Snowflake stored procedures allow more than one function. In JavaScript, helper functions are additional functions called from a main function.

It’s easy to write a helper function. Just before the main function’s final curly bracket, add the following:

function HelperFunction(stringIn) {

    //Do something here, and then return the value:
    var s = stringIn;
    return s;
}

You can also use the Snowflake Stored Procedure API inside helper functions. Here two helper functions using the Snowflake SP API that make your main function more readable and modular. ExecuteNonQuery executes a DML statement or SQL statement that does not return a table. ExecuteSingleValueQuery fetches the first row’s value for a specified column. You can use this to retrieve flags and settings or other values from control tables.

create or replace procedure SampleSP()
returns string
language javascript
as
$$
    var s;

    try{
        ExecuteNonQuery("create or replace table MY_NATION_TABLE like SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION;");
        ExecuteNonQuery("insert into MY_NATION_TABLE select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION;");
        s = ExecuteSingleValueQuery("N_NAME", "select * from MY_NATION_TABLE where N_NATIONKEY = 24;");
        ExecuteNonQuery("drop table MY_NATION_TABLE;");
        return s;
    }
    catch(err){
        return err;
    }
// ----------------------------------------------------------------------------------
// Main function above; helper functions below

    function ExecuteNonQuery(queryString) {
        var out = '';
        cmd1 = {sqlText: queryString};
        stmt = snowflake.createStatement(cmd1);
        var rs;
        try{
            rs = stmt.execute();
            rs.next();
            out = "SUCCESS: " + rs.getColumnValue(1);
        }
        catch(err) {
            throw "ERROR: " + err.message.replace(/\n/g, " ");
        }
        return out;
    }

    function ExecuteSingleValueQuery(columnName, queryString) {
        var out;
        cmd1 = {sqlText: queryString};
        stmt = snowflake.createStatement(cmd1);
        var rs;
        try{
            rs = stmt.execute();
            rs.next();
            return rs.getColumnValue(columnName);
        }
        catch(err) {
            if (err.message.substring(0, 18) == "ResultSet is empty"){
                throw "ERROR: No rows returned in query.";
            } else {
                throw "ERROR: " + err.message.replace(/\n/g, " ");
            } 
        }
        return out;
    }
$$;

call SampleSP();

Getting a Complete List of User Privileges in Snowflake

Getting a Complete List of User Privileges in Snowflake

These queries don’t need much explanation. I’ve had some customers request how to get a complete list of user privileges, often for auditing purposes. The two queries below will show the role hierarchy (which roles have been granted which other roles) and a complete list of effective permissions for each user.

For instance, if someone grants user ‘MARY’ the ‘PLAN_9’ role, and that role has a privilege to select from ‘TABLE_X”, then one row in the result will show that MARY can select from TABLE_X because she’s been granted the PLAN_9 role. All other users in the PLAN_9 role will also show a row with this set of user, role granting the privilege, and then the privilege itself.

Snowflake enforces a best practice for security and governance called RBAC, role based access control. Privileges go to roles, not directly to users. To grant a user a privilege, add the user to a role with the privilege.

-- The data returned by both queries is in the
-- SNOWFLAKE database, which has latency of up
-- to 3 hours to reflect changes

-- Get the effective role hierarchy for each user.
with
   -- CTE gets all the roles each role is granted
   ROLE_MEMBERSHIPS(ROLE_GRANTEE, ROLE_GRANTED_THROUGH_ROLE)
   as
    (
    select   GRANTEE_NAME, "NAME"
    from     SNOWFLAKE.ACCOUNT_USAGE.GRANTS_TO_ROLES
    where    GRANTED_TO = 'ROLE' and
             GRANTED_ON = 'ROLE' and
             DELETED_ON is null
    ),
    -- CTE gets all roles a user is granted
    USER_MEMBERSHIPS(ROLE_GRANTED_TO_USER, USER_GRANTEE, GRANTED_BY)
    as
     (
     select ROLE,
            GRANTEE_NAME,
            GRANTED_BY
     from SNOWFLAKE.ACCOUNT_USAGE.GRANTS_TO_USERS
     where DELETED_ON is null
     )
-- 
select 
        USER_GRANTEE,
        case
            when ROLE_GRANTED_THROUGH_ROLE is null 
                then ROLE_GRANTED_TO_USER 
            else ROLE_GRANTED_THROUGH_ROLE
        end 
        EFFECTIVE_ROLE,
        GRANTED_BY,
        ROLE_GRANTEE,
        ROLE_GRANTED_TO_USER,
        ROLE_GRANTED_THROUGH_ROLE
from    USER_MEMBERSHIPS U
    left join ROLE_MEMBERSHIPS R
        on U.ROLE_GRANTED_TO_USER = R.ROLE_GRANTEE
;

--------------------------------------------------------------------------------------------------

-- This gets all the grants for all of the users:
with 
    ROLE_MEMBERSHIPS
        (
            ROLE_GRANTEE, 
            ROLE_GRANTED_THROUGH_ROLE
        )
    as
    (
        -- This lists all the roles a role is in
        select   GRANTEE_NAME, "NAME"
        from     SNOWFLAKE.ACCOUNT_USAGE.GRANTS_TO_ROLES
        where    GRANTED_TO = 'ROLE' and
                 GRANTED_ON = 'ROLE' and
                 DELETED_ON is null
    ),
    USER_MEMBERSHIPS
        (
            ROLE_GRANTED_TO_USER,
            USER_GRANTEE,
            GRANTED_BY
        )
    as
     (
        select ROLE,GRANTEE_NAME,GRANTED_BY
        from SNOWFLAKE.ACCOUNT_USAGE.GRANTS_TO_USERS
        where DELETED_ON is null
     ),
    EFFECTIVE_ROLES
    (
        USER_GRANTEE,
        EFFECTIVE_ROLE,
        GRANTED_BY,
        ROLE_GRANTEE,
        ROLE_GRANTED_TO_USER,
        ROLE_GRANTED_THROUGH_ROLE
    )
    as
    (
        select 
            USER_GRANTEE,
            case 
                when ROLE_GRANTED_THROUGH_ROLE is null
                    then ROLE_GRANTED_TO_USER
                else ROLE_GRANTED_THROUGH_ROLE
            end
            EFFECTIVE_ROLE,
            GRANTED_BY,
            ROLE_GRANTEE,
            ROLE_GRANTED_TO_USER,
            ROLE_GRANTED_THROUGH_ROLE
        from USER_MEMBERSHIPS U
            left join ROLE_MEMBERSHIPS R
            on U.ROLE_GRANTED_TO_USER = R.ROLE_GRANTEE
    ),
    GRANT_LIST
        (
            CREATED_ON,
            MODIFIED_ON,
            PRIVILEGE,
            GRANTED_ON, 
            "NAME",
            TABLE_CATALOG,
            TABLE_SCHEMA,
            GRANTED_TO,
            GRANTEE_NAME,
            GRANT_OPTION
        )
    as
    (
        -- This shows all the grants (other than to roles)
        select  CREATED_ON,
                MODIFIED_ON,
                PRIVILEGE,
                "NAME",
                TABLE_CATALOG,
                TABLE_SCHEMA,
                GRANTED_TO,
                GRANTEE_NAME,
                GRANT_OPTION,
                GRANTED_ON
        from    SNOWFLAKE.ACCOUNT_USAGE.GRANTS_TO_ROLES
        where   GRANTED_ON <> 'ROLE' and
                PRIVILEGE <> 'USAGE' and 
                DELETED_ON is null
    )
select * from EFFECTIVE_ROLES R
    left join GRANT_LIST G 
        on G.GRANTED_TO = R.EFFECTIVE_ROLE
where G.PRIVILEGE is not null
;

Executing Multiple SQL Statements in a Stored Procedure – Part Deux

Executing Multiple SQL Statements in a Stored Procedure – Part Deux

A customer requested the ability to execute multiple SQL statements that result from a query. Today I learned about a new use case that required some augmentation of the stored procedure. Specifically, what happens when one of the many generated SQL statements encounters an error?

This updated stored procedure handles generated SQL statements that may encounter an error. You have two options to handle errors — report errors and continue or report first error and stop.

The stored procedure is overloaded, meaning that you can call it with or without the second parameter “continueOnError”. If you do not supply the second parameter, it will default to false and stop after the first error.

The output of the stored procedure is as follows:

<SQL statement to run> --Succeeded
<SQL statement to run> --Failed: <reason for failure>

By indicating the success or failure status as a SQL comment, you can modify and re-run the line manually or do some troubleshooting.

use database TEST;
use warehouse TEST;
 
create or replace procedure RunBatchSQL(sqlCommand String)
    returns string
    language JavaScript
as
$$
/**
 * Stored procedure to execute multiple SQL statements generated from a SQL query
 * Note that this procedure will always use the column named "SQL_COMMAND"
 * This overload of the function will stop after encountering the first error.
 *
 * @param {String} sqlCommand: The SQL query to run to generate one or more SQL commands 
 * @return {String}: A string containing all the SQL commands executed, each separated by a newline. 
 */
      cmd1_dict = {sqlText: SQLCOMMAND};
      stmt = snowflake.createStatement(cmd1_dict);
      rs = stmt.execute();
      var s = '';
      var CONTINUEONERROR = false; // Default to false for overloaded function
      while (rs.next()) {
          try{
                cmd2_dict = {sqlText: rs.getColumnValue("SQL_COMMAND")};
                stmtEx = snowflake.createStatement(cmd2_dict);
                stmtEx.execute();
                s += rs.getColumnValue(1) + " --Succeeded" + "\n";
             }
          catch(err) {
                s += rs.getColumnValue(1) + " --Failed: " + err.message.replace(/\n/g, " ") + "\n";
                if (!CONTINUEONERROR) return s;
          }
      }
      return s;
$$;
 
create or replace procedure RunBatchSQL(sqlCommand String, continueOnError Boolean)
    returns string
    language JavaScript
as
$$
/**
 * Stored procedure to execute multiple SQL statements generated from a SQL query
 * Note that this procedure will always use the column named "SQL_COMMAND".
 * This overload of the function will continue on errors if "continueOnError" = true.
 *
 * @param {String} sqlCommand: The SQL query to run to generate one or more SQL commands 
 * @param {Boolean} continueOnError: If true, continues on error. If false, stops after first error.
 * @return {String}: A string containing all the SQL commands executed, each separated by a newline. 
 */
      cmd1_dict = {sqlText: SQLCOMMAND};
      stmt = snowflake.createStatement(cmd1_dict);
      rs = stmt.execute();
      var s = '';
      while (rs.next()) {
          try{
                cmd2_dict = {sqlText: rs.getColumnValue("SQL_COMMAND")};
                stmtEx = snowflake.createStatement(cmd2_dict);
                stmtEx.execute();
                s += rs.getColumnValue(1) + " --Succeeded" + "\n";
             }
          catch(err) {
                s += rs.getColumnValue(1) + " --Failed: " + err.message.replace(/\n/g, " ") + "\n";
                if (!CONTINUEONERROR) return s;
          }
      }
      return s;
$$
;

-- This is a select query that will generate a list of SQL commands to excute, in this case some grant statements. 
-- This SQL will generate rows to grant select on all tables for the DBA role (change to specify another role). 
select distinct ('grant select on table ' || table_schema || '.' || table_name || ' to role DBA;') AS SQL_COMMAND
from INFORMATION_SCHEMA.TABLE_PRIVILEGES
where TABLE_SCHEMA <> 'AUDIT'
order by SQL_COMMAND;
 
-- As a convienience, this grabs the last SQL run so that it's easier to insert into the parameter used to call the stored procedure. 
set query_text = (  select QUERY_TEXT
                    from table(information_schema.query_history(result_limit => 2))
                    where SESSION_ID = Current_Session() and QUERY_TYPE = 'SELECT' order by START_TIME desc);
 
-- Confirm that the query_text variable has the correct SQL query to generate our SQL commands (grants in this case) to run.
select $query_text as QUERY_TEXT;
 
-- Run the stored procedure. Note that to view its output better, double click on the output to see it in multi-line format,
Call RunBatchSQL($query_text, true);
 
--Check the last several queries run to make sure it worked.
select QUERY_TEXT
from table(information_schema.query_history(result_limit => 100))
where SESSION_ID = Current_Session() order by START_TIME desc;

Snowflake UDF to Get Payment Card Type

Snowflake UDF to Get Payment Card Type

Payment Cards

This User Defined Function (UDF) doesn’t require much explanation. Payment card number goes in; payment card type comes out. Since it is designed for speed, it does not validate the check digit. A subsequent post will provide a UDF to validate the check digit using the Luhn algorithm.

/********************************************************************************************************************

Function:    PaymentCardType
Description: Decodes the type of payment card from Visa, Mastercard, AMEX, etc.
Parameters:  A string indicating the type of payment card, or a blank string if not identified. 

*********************************************************************************************************************/
create or replace function PaymentCardType(cardNumber string)
  returns string 
  language javascript
  strict
  as '
     
    //Remove all spaces and dashes. Simply ignore them.
    NUMBER = CARDNUMBER.replace(/ /g, "");
    NUMBER = NUMBER.replace(/-/g, "");
     
     
    // Visa
    var re = new RegExp("(4[0-9]{15})");
    if (NUMBER.match(re) != null)
        return "Visa";

    // Mastercard
    re = new RegExp("(5[1-5][0-9]{14})");
    if (NUMBER.match(re) != null)
        return "Mastercard";

    // AMEX
    re = new RegExp("^3[47]");
    if (NUMBER.match(re) != null)
        return "AMEX";

    // Discover
    re = new RegExp("^(6011|622(12[6-9]|1[3-9][0-9]|[2-8][0-9]{2}|9[0-1][0-9]|92[0-5]|64[4-9])|65)");
    if (NUMBER.match(re) != null)
        return "Discover";

    // Diners
    re = new RegExp("^36");
    if (NUMBER.match(re) != null)
        return "Diners";

    // Diners - Carte Blanche
    re = new RegExp("^30[0-5]");
    if (NUMBER.match(re) != null)
        return "Diners - Carte Blanche";

    // JCB
    re = new RegExp("^35(2[89]|[3-8][0-9])");
    if (NUMBER.match(re) != null)
        return "JCB";

    // Visa Electron
    re = new RegExp("^(4026|417500|4508|4844|491(3|7))");
    if (NUMBER.match(re) != null)
        return "Visa Electron";

    return "";
 
  ';

-- Test the UDF:
select PaymentCardType('4470653497431234');

Object Dependency Checking in Snowflake

Object Dependency Checking in Snowflake

Snowflake allows dropping of an object that has a view dependent on it. For example, create tables A and B, and view C that joins A and B together. Snowflake will allow you to drop table A or B, and when you attempt to use view C you will get an error. This behavior is fairly common for DBMSes.

It’s important to document and maintain dependencies using a combination of object (source) control, a rigorous testing regimen, and promotion protocols. That said, here’s a stored procedure to check object dependencies, specifically to see if all views are working. The stored procedure inventories all views in the Snowflake account and attempts to select a single row. If the row selection succeeds, it reports success for that view. If it fails, it reports the reason for the failure.

Note that running this stored procedure could take a very long time since it needs to select one row from every view in the account. This is a prototype procedure, and a future version will include the ability to limit the database and/or schema to test, probably using RegEx to enable pattern matching. This will allow testing of all views in the PROD database matching something like a prefix, since whatever change may be only likely to affect views in that database starting with a prefix.

use warehouse TEST;  -- An Extra Small warehouse is best for this test.
use role SYSADMIN;   -- Could be any role except for ACCOUNTADMIN. The ACCOUNTADMIN will own the SP to test views and grant usage to this role.
use database TEST;   -- Could be any database; just put one in context.

use role ACCOUNTADMIN; -- We need to create the stored procedure as ACCOUNTADMIN to ensure we have read access to all views.

/********************************************************************************************************************

Procedure:   TestAllViews
Description: Executes as ACCOUNTADMIN and attempts to select a single row from every view in the account. For each
             view in the account, it returns "Success" if it can read a single row, or "Failure" and the error 
             message if it fails trying to read a single row. This enables a quick sanity check of the views after
             changing objects, such as dropping a table, view, or column.
Parameters:  None.
Return:      A string with each line the test result of each view.
Notes:       This is a prototype SP. Future versions will have parameters to limit databases and schemas to check.

*********************************************************************************************************************/
create or replace procedure TEST.PUBLIC.TestAllViews()
returns string
language JavaScript
execute as OWNER
as
  $$  
   
    var sql_command = 
    `select DATABASE_NAME from INFORMATION_SCHEMA.DATABASES;`;
    
    var databaseArray = new Array();

    try {
            var stmt = snowflake.createStatement( {sqlText: sql_command} );
            var resultSet = stmt.execute();
            
            while (resultSet.next())  {
                databaseArray.push(resultSet.getColumnValue('DATABASE_NAME'));
            }
        }
    catch (err)  {
        return "Failed: " + err;   // Return a success/error indicator.
        }
    
    var outString = '';
    var index = 0;
    
    var viewArray = new Array();
    
    while (index < databaseArray.length) { 
        
        sql_command = "select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME from " + databaseArray[index] + 
                      ".INFORMATION_SCHEMA.VIEWS where TABLE_SCHEMA <> 'INFORMATION_SCHEMA';";
        
        try {
            var stmt = snowflake.createStatement( {sqlText: sql_command} );
            var resultSet = stmt.execute();
            
            while (resultSet.next())  {
                viewArray.push(resultSet.getColumnValue('TABLE_CATALOG') + "." + 
                               resultSet.getColumnValue('TABLE_SCHEMA') + "." + 
                               resultSet.getColumnValue('TABLE_NAME'));
            }
        }
    catch (err)  {
        return "Failed: " + err;   // Return a success/error indicator.
        }
      
      index++; 
    }
    
    index = 0;
    
    // We now have a complete list of views... Test them one at a time...
    
    var outArray = new Array();

    index = 0;

    while (index < viewArray.length) { 
      sql_command = "select * from " + viewArray[index] + " limit 1;";

      try {
        var stmt = snowflake.createStatement( {sqlText: sql_command} );
        var resultSet = stmt.execute();
        
        outArray.push("Success..." + viewArray[index]);
      }
      catch (err) {
        outArray.push("Failure..." + viewArray[index] + " : " + err.message.replace(/(\r\n|\n|\r)/gm, " - "));
       }
      index++; 
    }

    index = 0;    

    while (index < outArray.length) { 
      outString += outArray[index] + "\n"; 
      index++; 
    }   

   return outString;

  $$;

grant usage on procedure TEST.PUBLIC.TestAllViews() to role SYSADMIN;

use role SYSADMIN;

call TEST.PUBLIC.TestAllViews();
-- Note: Completing this SP could take a long time.

/*

The output should look something like this (note dropped table "TEST.VIEWTEST.A" to break view "TEST.VIEWTEST.C")

Success...CITIBIKE_BIG_V2.CLUSTERED.TRIPS_VW
Success...CITIBIKE_BIG_V2.CLUSTERED.TRIPS_WEATHER_VW
Success...CITIBIKE_BIG_V2.UNCLUSTERED.TRIPS_VW
Success...CITIBIKE_BIG_V2.UNCLUSTERED.TRIPS_WEATHER_VW
Success...SALES.PUBLIC.REVENUE
Success...SALES.PUBLIC.SALES_REVENUE
Success...SALES_05_31_19.PUBLIC.REVENUE
Success...SALES_05_31_19.PUBLIC.SALES_REVENUE
Success...TEST.AUDIT.NEW_VIEW
Failure...TEST.VIEWTEST.C : SQL compilation error: - Failure during expansion of view 'C': SQL compilation error: - Object 'TEST.VIEWTEST.A' does not exist.
Success...TEST.PUBLIC.ORDERS_SECURE_VIEW
Success...TEST.TPCH_SF1.ORDERS_SECURE_VIEW
Success...TEST_CHAR.PUBLIC.V_TEST_CHAR

*/
Setting the Default Timezone for a Snowflake Account

Setting the Default Timezone for a Snowflake Account

By default, Snowflake accounts have their timezone set to US/Pacific. Here is how to change that permanently at the account level. You can still override it for specific users or sessions.

Set the account’s default time zone to US Eastern:

use role ACCOUNTADMIN;  -- Must have ACCOUNTADMIN to change the setting.
alter account set TIMEZONE = 'America/New_York';
use role SYSADMIN;  -- (Best practice: change role when done using ACCOUNTADMIN)

Set the account’s default time zone to UTC (Universal Coordinated Time, also called Greenwich Mean Time, GMT, or Zulu Time):

use role ACCOUNTADMIN; 
alter account set TIMEZONE = 'Etc/UTC';
use role SYSADMIN;

A complete list of timezone is available here: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones.

Executing Multiple SQL Statements in a Stored Procedure

Executing Multiple SQL Statements in a Stored Procedure

A classic DBA technique to run a large number of SQL statements is to create them using a concatenated select statement. Suppose you need to delete all tables that end with “TEST”. You can list them in Snowflake’s INFORMATION_SCHEMA using:

select “TABLE_NAME” from INFORMATION_SCHEMA.TABLES where “TABLE_NAME” ilike ‘%TEST’;

If you need to drop a handful of tables that way, a list if enough. If there are dozens or hundreds, this works better:

select 'drop table ' || "TABLE_NAME" || ';' from INFORMATION_SCHEMA.TABLES where "TABLE_NAME" ilike '%TEST';

The above example generates a drop statement for each table in a database that ends with TEST. You can use this technique to generate bulk SQL to perform a wide range of management tasks. For example you may want to change the ownership of tables in bulk like this:

SELECT 'grant ownership on table ' || 
       table_name || 
       ' to role my_new_role copy grants;' 
       AS SQL_COMMAND
FROM INFORMATION_SCHEMA.TABLE_PRIVILEGES 
WHERE grantor = 'old_grant_role';

Executing this SQL will generate rows that look like this:

RowSQL_COMMAND
1grant ownership on table CUSTOMER to role my_new_role copy grants;
2grant ownership on table LINEITEM to role my_new_role copy grants;
3grant ownership on table NATION to role my_new_role copy grants;
4grant ownership on table ORDERS to role my_new_role copy grants;

You can see where this is heading. A SQL statement generates statements to run, and then someone — or preferably something by automation — runs them.

Here is that something. The Snowflake stored procedure below will:

  • Accept a string parameter that is a SQL statement designed to generate rows of SQL statements to execute.
  • Execute the input SQL statement to generate a list of SQL statements to run.
  • Run all statements identified by the “SQL_COMMAND” column one at a time.
use database TEST;
use warehouse TEST;

create or replace procedure RunBatchSQL(sqlCommand String)
    returns string
    language JavaScript
as
$$
/**
 * Stored procedure to execute multiple SQL statements generated from a SQL query
 * Note that this procedure will always use the column named "SQL_COMMAND"
 *
 * @param {String} sqlCommand: The SQL query to run to generate one or more SQL commands 
 * @return {String}: A string containing all the SQL commands executed, each separated by a newline. 
 */
      cmd1_dict = {sqlText: SQLCOMMAND};
      stmt = snowflake.createStatement(cmd1_dict);
   
      rs = stmt.execute();

      var s = '';

      while (rs.next())  {
          cmd2_dict = {sqlText: rs.getColumnValue("SQL_COMMAND")};
          stmtEx = snowflake.createStatement(cmd2_dict);
          stmtEx.execute();
          s += rs.getColumnValue(1) + "\n";
          }
          
      return s;
      
$$
;

-- This is a select query that will generate a list of SQL commands to excute, in this case some grant statements. 
-- This SQL will generate rows to grant select on all tables for the DBA role (change to specify another role). 
select distinct ('grant select on table ' || table_schema || '.' || table_name || ' to role DBA;') AS SQL_COMMAND
from INFORMATION_SCHEMA.TABLE_PRIVILEGES
where TABLE_SCHEMA <> 'AUDIT'
order by SQL_COMMAND;

-- As a convienience, this grabs the last SQL run so that it's easier to insert into the parameter used to call the stored procedure. 
set query_text = (  select QUERY_TEXT
                    from table(information_schema.query_history(result_limit => 2))
                    where SESSION_ID = Current_Session() and QUERY_TYPE = 'SELECT' order by START_TIME desc);

-- Confirm that the query_text variable has the correct SQL query to generate our SQL commands (grants in this case) to run.
select $query_text;

-- Run the stored procedure. Note that to view its output better, double click on the output to see it in multi-line format,
Call RunBatchSQL($query_text);

--Check the last several queries run to make sure it worked.
select QUERY_TEXT
from table(information_schema.query_history(result_limit => 100))
where SESSION_ID = Current_Session() order by START_TIME desc;
Capturing Audit Trails in Snowflake

Capturing Audit Trails in Snowflake

This article discusses two types of audit trails, 1) SQL statement execution logs, and 2) session logs for security.

The Snowflake documentation discusses the QUERY_HISTORY functions, which “return query activity within the last 7 days.” Although limited to seven days, these functions provide an easy way to create a SQL log retained indefinitely.

--Create database AUDIT_DB
create database AUDIT_DB;

Next, create a SQL audit table with all available information returnable from the QUERY_HISTORY function.

--Create an audit table
create table "AUDIT_DB"."PUBLIC"."AUDIT_LOG" as
select *
  from table(information_schema.query_history(RESULT_LIMIT=>10000))
order by START_TIME;

Since the QUERY_HISTORY returns up to seven days of SQL statement history, to preserve the log indefinitely this SQL needs to run at least once every N days where N < 7:

--Add new audit items to the table
insert into "AUDIT_DB"."PUBLIC"."AUDIT_LOG"
  select * from table(information_schema.query_history(RESULT_LIMIT=>10000))
  where START_TIME > (select max(START_TIME) as LASTDATE from AUDIT_DB.PUBLIC.AUDIT_LOG)
order by START_TIME;

CHAR Padding in Snowflake

CHAR Padding in Snowflake

For strings that need to be fixed-width, it’s common to use the CHAR data type, for example CHAR(10). In this example, often databases will right pad a field has fewer than 10 characters with spaces.

If we define a CHAR(10) and store “ABCDE” in the field, some databases will add spaces on the right to fill the 10 width. In the example below, the “.” character represents the space:

Column_Name
1234567890
ABCDE.....

A side effect of the advanced way Snowflake stores strings is that a column defined as CHAR(10) with only 5 characters such as “ABCDE” will return only the five characters instead of 10 such as “ABCDE “. From the Snowflake documentation:

Snowflake currently deviates from common CHAR semantics in that strings shorter than the maximum length are not space-padded at the end.

https://docs.snowflake.net/manuals/sql-reference/data-types-text.html#char-character

For use cases where fixed-width of strings is important, there are a couple of options. One option is to right pad the strings on ingest, but this adds work and storage overhead.

I think a better way is to create a simple view that returns right padded strings for columns defined as CHAR. Fortunately, the Snowflake RPAD function makes this easy.

The following sample explains how to create a view to ensure fixed-width columns. Note that since CHAR is a synonym for VARCHAR, this technique will work with columns defined as CHAR, VARCHAR, STRING, CHARACTER, and TEXT.

--Create a simple test table with a CHAR(10)
create table TEST_CHAR
    (
    TEST_COLUMN char(10)
    );

--Insert some values 
insert into TEST_CHAR (TEST_COLUMN) values ('1234');
insert into TEST_CHAR (TEST_COLUMN) values ('12345');
insert into TEST_CHAR (TEST_COLUMN) values ('1234567890');

--Create a view that right pads the CHAR fields to width 10
--Change the space to another character to see the effect.
create or replace view V_TEST_CHAR as
select RPAD(TEST_COLUMN, 10, ' ') as TEST_COLUMN from TEST_CHAR;

--Select from the view.
select * from V_TEST_CHAR;
Theme: Overlay by Kaira