Snapshots track historical changes in your data warehouse by implementing Type 2 Slowly Changing Dimensions (SCD). Instead of overwriting data, snapshots create new records while preserving history of changes.
In data warehousing, Type 2 SCD is one of several methods for handling changes:
Type 1: Overwrites old data (no history)
Type 2: Preserves history by creating new records (what snapshots implement)
Type 3: Adds new columns for changes
Snapshots are ideal for tracking changes in fields that update infrequently and require historical tracking, such as customer status changes, product pricing, and order processing states.
To understand how snapshots preserve history, let's look at a customer account status change:
-- Let's say we have a customer account table where status can change-- Here's what it looks like initiallyid status updated_at101 active 2023-06-15-- When the customer downgrades their account, the data changes to:id status updated_at101basic2023-08-20-- Without snapshots, we've lost the history of when they were 'active'.-- However, with snapshots, we preserve the complete history:id status updated_at dbt_valid_from dbt_valid_to101 active 2023-06-152023-06-152023-08-20-- Historical record101basic2023-08-202023-08-20null-- Current record
Using this snapshot, we can query historical states, like a customer's status on any given date.
Basic Configuration
Prerequisites
Create a target schema for snapshots in your data warehouse
Ensure source data has either reliable timestamps or columns to track for changes
Snapshots are defined in .sql files within your project's snapshots directory:
If you are brand new to using snapshots, use this step-by-step tutorial to create and test a snapshot with different strategies.
Configuration Example
Initial Setup
Create source table:
-- models/staging/stg_test_users.sqlWITH source_data AS ( SELECT 1as user_id,'active'asstatus,'user1@test.com'as email, CURRENT_TIMESTAMP() as updated_at UNION ALL SELECT 2, 'pending', 'user2@test.com', CURRENT_TIMESTAMP())SELECT * FROM source_data
This shows our initial state: two users with their current statuses. The null in dbt_valid_to indicates these are the current active records.
Test Status Change
Update source data:
-- models/staging/stg_test_users.sqlWITH source_data AS ( SELECT 1as user_id,'inactive'asstatus,'user1@test.com'as email, CURRENT_TIMESTAMP() as updated_at)SELECT * FROM source_data
Run commands:
dbtrun--selectstg_test_usersdbtsnapshot
Verify in Snowflake:
SELECT * FROM your_database.snapshots.users_snapshot WHERE user_id =1ORDER BY dbt_valid_from;
Now we see the history: the user's original 'active' status is preserved with an end date ([time2]), and a new record shows their current 'inactive' status.
-- models/staging/stg_test_users.sqlWITH source_data AS ( SELECT 1as user_id,'inactive'asstatus,'newemail@test.com'as email, CURRENT_TIMESTAMP() as updated_at)SELECT * FROM source_data
Run commands:
dbtrun--selectstg_test_usersdbtsnapshot
Verify in Snowflake:
SELECT * FROM your_database.snapshots.users_snapshot ORDER BY user_id, dbt_valid_from;
This shows the complete history: the original status, the status change, and finally the email change. Each record shows when it was valid, creating a complete historical trail of changes.
Meta Fields
As of dbt version ≥ 1.9, snapshots add four tracking columns:
dbt_valid_from: When this version became valid
dbt_valid_to: When this version became invalid (null for current version)
dbt_updated_at: Timestamp when the snapshot was taken
dbt_scd_id: Unique identifier for each version
Best Practices
1. Snapshot Frequency
Execute dbt snapshot before running dependent models
Schedule regular snapshots in production based on data change frequency
Consider storage costs versus historical data needs
3. Query Historical Data
To access historical data from the snapshot table, filter to see the state of a specific record (order_id 123) as it existed on a given date (2024-01-01).
-- Get data as it looked on a specific dateSELECT *FROM {{ ref('orders_snapshot') }}WHERE order_id =123 AND dbt_valid_from <='2024-01-01' AND (dbt_valid_to >'2024-01-01'OR dbt_valid_to IS NULL)