Incremental models allow you to update only new or modified data in your warehouse instead of rebuilding entire tables. This optimization is particularly valuable when working with:
Large datasets (millions/billions of rows)
Computationally expensive transformations
Basic Configuration
While these examples use Snowflake syntax, the core concepts apply to most data warehouses. Specific syntax and available features may vary by platform.
Here's a simple incremental model configuration:
{{ config( materialized='incremental', unique_key='id' )}}SELECT id,status, amount, updated_atFROM {{ ref('stg_source') }}{% if is_incremental() %}-- This filter will only be applied on an incremental run WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }}){% endif %}
Key Components
Materialization Config: Set materialized='incremental' in your config block
Unique Key: Define what makes each row unique (single column or multiple columns)
Incremental Logic: Use the is_incremental() macro to filter for new/changed records
Incremental Strategies
Understanding which strategy to use is crucial for optimal performance. Snowflake supports three strategies, each with specific use cases:
The merge strategy uses Snowflake's MERGE statement to update existing records and insert new ones. This strategy provides the most control over how your incremental model is updated but requires more computational resources.
When to use:
Tables requiring both inserts and updates
When you need to update specific columns
When data consistency is critical
Advantages:
Most flexible strategy
Precise control over column updates
Maintains data consistency
Handles complex update patterns
Trade-offs:
Slower than other strategies
More resource-intensive
Can be costly for very large datasets
Example Implementation:
{{ config( materialized='incremental', incremental_strategy='merge', unique_key='id', )}}SELECT id,status, amount, updated_atFROM {{ ref('stg_source') }}{% if is_incremental() %} WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }}){% endif %}
The delete+insert strategy is like a "replace-all" approach for updated records. It first removes all records that match your incremental criteria and then inserts the new versions of those records. Think of it as replacing an entire page in a book rather than editing individual words.
When to use:
Batch updates where most records change
When merge performance becomes a bottleneck
Simpler change patterns
Large-scale updates
Advantages:
Better performance than merge for bulk updates
Simpler execution plan
Good for high-volume changes
Trade-offs:
Less granular control than merge
All columns are updated (no column-specific updates)
Potential for higher resource usage during delete phase
Batch updates where most records change
When merge performance becomes a bottleneck
Simpler change patterns
Large-scale updates
Advantages:
Better performance than merge for bulk updates
Simpler execution plan
Good for high-volume changes
Trade-offs:
Less granular control than merge
All columns are updated (no column-specific updates)
Potential for higher resource usage during delete phase
Example Implementation:
{{ config( materialized='incremental', incremental_strategy='delete+insert', unique_key='id' )}}SELECT id,status, batch_dateFROM {{ ref('stg_source') }}{% if is_incremental() %} WHERE batch_date >= (SELECT MAX(batch_date) FROM {{ this }}){% endif %}
The append-only strategy is the simplest approach - it just adds new records to your table without updating existing ones. Think of it as adding new entries to a log book where historical entries are never changed. This strategy is perfect for event logging and similar use cases where historical data remains unchanged.
When to use:
Event logs
Immutable data
Time-series data without updates
When duplicates are acceptable
Advantages:
Fastest performance
Simplest execution plan
Minimal resource usage
No need for unique key
Trade-offs:
No update capability
Can create duplicate data
Requires downstream deduplication if needed
Example Implementation:
{{ config( materialized='incremental', incremental_strategy='append' )}}SELECT event_id, event_type, event_timestampFROM {{ ref('stg_source') }}{% if is_incremental() %} WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }}){% endif %}
Configuration Examples
This guide walks you through building and testing an incremental model with different strategies. You'll learn how to:
Set up an initial incremental model
Test how data changes are handled
Verify results in your warehouse
Try different incremental strategies
Requirements
Access to your data warehouse
Basic understanding of dbt™ models
dbt™ project set up
Configuration Examples
Step 1: Initial Setup and Testing
First, create the staging model:
-- models/staging/stg_test_source.sqlWITH source_data AS ( SELECT 1as id,'active'asstatus,100as amount, CURRENT_TIMESTAMP() as updated_at UNION ALL SELECT 2, 'pending', 200, CURRENT_TIMESTAMP() UNION ALL SELECT 3, 'active', 300, CURRENT_TIMESTAMP())SELECT * FROM source_data
Run staging model:
dbtrun--selectstg_test_source
Verify staging data in Snowflake:
SELECT * FROM your_database.your_schema.stg_test_source ORDER BY id;
Expected Result:
ID | STATUS | AMOUNT | UPDATED_AT
1 | active | 100 | [timestamp]
2 | pending | 200 | [timestamp]
3 | active | 300 | [timestamp]
Create the incremental model:
-- models/incremental_test.sql{{ config( materialized='incremental', unique_key='id' )}}SELECT id,status, amount, updated_atFROM {{ ref('stg_test_source') }}{% if is_incremental() %} WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }}){% endif %}
Run incremental model:
dbtrun--selectincremental_test
Verify incremental model in Snowflake:
SELECT * FROM your_database.your_schema.incremental_test ORDER BY id;
Expected Result: Same as staging model (3 rows)
Step 2: Testing Merge Strategy (Default)
Update staging model with changes:
-- models/staging/stg_test_source.sqlWITH source_data AS ( SELECT 1as id,'inactive'asstatus, -- Changed status150as amount, -- Changed amount CURRENT_TIMESTAMP() as updated_at UNION ALL SELECT 4, 'new', 400, CURRENT_TIMESTAMP() -- New record)SELECT * FROM source_data
Run staging model:
dbtrun--selectstg_test_source
Verify staging data:
SELECT * FROM your_database.your_schema.stg_test_source ORDER BY id;
Expected Result:
ID | STATUS | AMOUNT | UPDATED_AT
1 | inactive | 150 | [new_timestamp]
4 | new | 400 | [new_timestamp]
Run incremental model:
dbtrun--selectincremental_test
Verify incremental model:
SELECT * FROM your_database.your_schema.incremental_test ORDER BY id;
Expected Result:
ID | STATUS | AMOUNT | UPDATED_AT
1 | inactive | 150 | [new_timestamp] -- Updated
2 | pending | 200 | [old_timestamp] -- Unchanged
3 | active | 300 | [old_timestamp] -- Unchanged
4 | new | 400 | [new_timestamp] -- New
Step 3: Testing Delete+Insert Strategy
Update incremental model config (keep same staging data):
-- models/incremental_test.sql{{ config( materialized='incremental', incremental_strategy='delete+insert', unique_key='id' )}}[Rest of the model SQL stays the same]
Run incremental model:
dbtrun--selectincremental_test
Verify in Snowflake:
SELECT * FROM your_database.your_schema.incremental_test ORDER BY id;
Expected Result: Similar to merge strategy but with updated timestamps
Step 4: Testing Append Strategy
Update incremental model config:
-- models/incremental_test.sql{{ config( materialized='incremental', incremental_strategy='append' )}}[Rest of the model SQL stays the same]
Run incremental model:
dbtrun--full-refresh--selectincremental_test
Verify initial state:
SELECT * FROM your_database.your_schema.incremental_test ORDER BY id;
Update staging data to trigger append:
-- models/staging/stg_test_source.sqlWITH source_data AS ( SELECT 1as id,'inactive'asstatus,150as amount, CURRENT_TIMESTAMP() as updated_at UNION ALL SELECT 5, 'newest', 500, CURRENT_TIMESTAMP() -- Another new record)SELECT * FROM source_data
SELECT * FROM your_database.your_schema.incremental_test ORDER BY updated_at;
Expected Result: Previous records plus new appended records
Best Practices
1. Handle Late-Arriving Data
Data doesn't always arrive in perfect chronological order due to network delays, system outages, or batch processing. Always include a buffer period in your incremental logic:
{% if is_incremental() %}-- Look back 3 days to catch late-arriving data WHERE updated_at > DATEADD(days, -3, (SELECT MAX(updated_at) FROM {{ this }})){% endif %}
Adjust the look-back period (3 days in this example) based on your data patterns and service level agreements (SLAs).
2. Manage Schema Changes
Tables evolve over time with new columns being added or modified. Use the on_schema_change configuration to handle these changes:
This is useful for scenarios like daily customer metrics where uniqueness depends on both customer and date.
2. Custom Update Columns
Control exactly which columns should be updated during merge operations:
{{ config( materialized='incremental', merge_update_columns=['status', 'amount', 'updated_at'] -- Only update these )}}
This is particularly useful when you want to preserve certain column values while updating others.
Remember: While these examples use Snowflake syntax, the core concepts apply to most data warehouses. Specific syntax and available features may vary by platform.