Snowflake Storage

Paradime enable you to read from the Paradime hosted AWS S3 bucket all the files generated from production schedules running with paradime using Snowflake Storage Integration.

To setup this integration reach out to the Paradime team to get:

  • STORAGE_AWS_ROLE_ARN

  • STORAGE_ALLOWED_LOCATIONS

You will need these to create the storage integration in your Snowflake account.

1. Create a Snowflake Storage Integration

After receiving the AWS role and S3 path from the Paradime team, you will need to create a new storage integration to establish connecting your Snowflake account to the AWS S3 bucket containing your schedules' run metadata.\

Execute the below SQL command in Snowflake replacing the placeholder text with the appropriate values.

use role ACCOUNTADMIN;

CREATE STORAGE INTEGRATION BOLT_METADATA
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = '<aws_role_provided_by_paradime>'
  STORAGE_ALLOWED_LOCATIONS = ('<s3_pathprovided_by_paradime>');

Now you will need to the get from Snowflake the STORAGE_AWS_IAM_USER_ARN and the STORAGE_AWS_EXTERNAL_ID for the Storage Integration you have just created by executing the below command and share it with the Paradime team to complete the configuration.

DESC INTEGRATION BOLT_METADATA;

2. Create a Snowflake External Stage to query our metadata

You can now create a Snowflake External Stage to specify where data files are stored so that the data in the files can be loaded into a table.

First lets create a file format in Snowflake. This will be needed in the next step when creating the STAGE

CREATE OR REPLACE FILE FORMAT my_json_format
  TYPE = 'json';

In the example below we are going to create an external stage based on our storage integration in the database called ANALYTICS and in the schema EXTERNAL.

Note that you can create multiple Snowflake Stages connecting to your Snowflake Storage integration.

Below we are going to create a STAGE to access schedules metadata like manifest.json for any of the schedules in our default Paradime workspace (This usually is named after the organization name you enter to login/signup).

CREATE OR REPLACE STAGE ANALYTICS.EXTERNAL.BOLT_METADATA
  STORAGE_INTEGRATION = "BOLT_METADATA" 
  URL = 's3://paradime-s3-uany7edagtovarzx-eu-west-2/bolt/'
  FILE_FORMAT = my_json_format;

Alternatively you can create a STAGE point directly to a given schedule name folder in the AWS S3

CREATE OR REPLACE STAGE ANALYTICS.EXTERNAL.BOLT_METADATA
  STORAGE_INTEGRATION = "BOLT_METADATA" 
  URL = 's3://paradime-s3-uany7edagtovarzx-eu-west-2/bolt/run/daily_jaffle/current'
  FILE_FORMAT = my_json_format;

Query dbt™️ runs artifacts directly from Snowflake

Now that the Snowflake Stage has been configured, you can query your Bolt schedules metadata directly from Snowflake with a query like the one below.

👉 See also: Querying Snowflake Stage.

SELECT
  METADATA $FILENAME,
  METADATA $FILE_ROW_NUMBER,
  METADATA $FILE_CONTENT_KEY,
  METADATA $FILE_LAST_MODIFIED,
  METADATA $START_SCAN_TIME,
  parse_json($1) as JSON_META
FROM
  @ANALYTICS.EXTERNAL.BOLT_METADATA/target/manifest.json;

Last updated

#350: Bolt - Template fixes pt 1

Change request updated