Incremental models allow you to update only new or modified data in your warehouse instead of rebuilding entire tables. This optimization is particularly valuable when working with:
Large datasets (millions/billions of rows)
Computationally expensive transformations
Time-series data with frequent updates
Basic Configuration
While these examples use Snowflake syntax, the core concepts apply to most data warehouses. Specific syntax and available features may vary by platform.
Here's a simple incremental model configuration:
{{
config(
materialized='incremental',
unique_key='id'
)
}}
SELECT
id,
status,
amount,
updated_at
FROM {{ ref('stg_source') }}
{% if is_incremental() %}
-- This filter will only be applied on an incremental run
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}
Key Components
Materialization Config: Set materialized='incremental' in your config block
Unique Key: Define what makes each row unique (single column or multiple columns)
Incremental Logic: Use the is_incremental() macro to filter for new/changed records
Incremental Strategies
Understanding which strategy to use is crucial for optimal performance. Snowflake supports three strategies, each with specific use cases:
The merge strategy uses Snowflake's MERGE statement to update existing records and insert new ones. This strategy provides the most control over how your incremental model is updated but requires more computational resources.
When to use:
Tables requiring both inserts and updates
When data consistency is critical
When specific columns need updating
Advantages:
Most flexible strategy
Precise control over column updates
Maintains data consistency
Handles complex update patterns
Trade-offs:
Slower than other strategies
More resource-intensive
Can be costly for very large datasets
Example Implementation:
{{
config(
materialized='incremental',
incremental_strategy='append'
)
}}
SELECT
event_id,
event_type,
event_time
FROM {{ ref('stg_source') }}
{% if is_incremental() %}
WHERE event_time > >= ((SELECT MAX(event_time) FROM {{ this }})
{% endif %}
The delete+insert strategy is like a "replace-all" approach for updated records. It first removes all records that match your incremental criteria and then inserts the new versions of those records. Think of it as replacing an entire page in a book rather than editing individual words.
When to use:
Batch updates where most records change
When merge performance becomes a bottleneck
Simpler change patterns
Large-scale updates
Advantages:
Better performance than merge for bulk updates
Simpler execution plan
Good for high-volume changes
Trade-offs:
Less granular control than merge
All columns are updated (no column-specific updates)
Potential for higher resource usage during delete phase
Example Implementation:
{{
config(
materialized='incremental',
incremental_strategy='delete+insert',
unique_key='id'
)
}}
SELECT
id,
status,
batch_date
FROM {{ ref('stg_source') }}
{% if is_incremental() %}
WHERE batch_date >= (SELECT MAX(batch_date) FROM {{ this }})
{% endif %}
The append-only strategy is the simplest approach - it just adds new records to your table without updating existing ones. Think of it as adding new entries to a log book where historical entries are never changed. This strategy is perfect for event logging and similar use cases where historical data remains unchanged.
When to use:
Event logs
Immutable data
Time-series data without updates
When duplicates are acceptable
Advantages:
Fastest performance
Simplest execution plan
Minimal resource usage
No need for unique key
Trade-offs:
No update capability
Can create duplicate data
Requires downstream deduplication if needed
Example Implementation:
{{
config(
materialized='incremental',
incremental_strategy='append'
)
}}
SELECT
event_id,
event_type,
event_timestamp
FROM {{ ref('stg_source') }}
{% if is_incremental() %}
WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}
The microbatch strategy processes incremental updates in smaller batches, designed specifically for large time-series datasets.
When to use:
Very large time-series datasets
When reliability is crucial
When processing needs to be more resilient
Advantages:
More efficient for large datasets
Better error handling
Improved resilience
Reduced memory usage
Trade-offs:
More complex setup
Requires careful batch size tuning
Only available on certain platforms
Example Implementation:
{{
config(
materialized='incremental',
incremental_strategy='microbatch',
unique_key='id',
incremental_predicates=[
"DBT_INTERNAL_DEST.event_time > dateadd(day, -7, current_date)"
]
)
}}
SELECT
id,
event_type,
event_time
FROM {{ ref('stg_source') }}
{% if is_incremental() %}
WHERE event_time > (SELECT MAX(event_time) FROM {{ this }})
{% endif %}
Configuration Examples
This guide walks you through building and testing an incremental model with different strategies. You'll learn how to:
Set up an initial incremental model
Test how data changes are handled
Verify results in your warehouse
Try different incremental strategies
Requirements
Access to your data warehouse
Basic understanding of dbt™ models
dbt™ project set up
Configuration Examples
Step 1: Initial Setup and Testing
First, create the staging model:
-- models/staging/stg_test_source.sql
WITH source_data AS (
SELECT 1 as id,
'active' as status,
100 as amount,
CURRENT_TIMESTAMP() as updated_at
UNION ALL
SELECT 2, 'pending', 200, CURRENT_TIMESTAMP()
UNION ALL
SELECT 3, 'active', 300, CURRENT_TIMESTAMP()
)
SELECT * FROM source_data
Run staging model:
dbt run --select stg_test_source
Verify staging data in Snowflake:
SELECT
*
FROM
your_database.your_schema.stg_test_source
ORDER BY
id
Expected Result:
ID | STATUS | AMOUNT | UPDATED_AT
1 | active | 100 | [timestamp]
2 | pending | 200 | [timestamp]
3 | active | 300 | [timestamp]
Create the incremental model:
-- models/incremental_test.sql
{{
config(
materialized='incremental',
unique_key='id'
)
}}
SELECT
id,
status,
amount,
updated_at
FROM {{ ref('stg_test_source') }}
{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}
Run incremental model:
dbt run --select incremental_test
Verify incremental model in Snowflake:
SELECT
*
FROM
your_database.your_schema.incremental_test
ORDER BY
id
Expected Result: Same as staging model (3 rows)
Step 2: Testing Merge Strategy (Default)
Update staging model with changes:
-- models/staging/stg_test_source.sql
WITH source_data AS (
SELECT 1 as id,
'inactive' as status, -- Changed status
150 as amount, -- Changed amount
CURRENT_TIMESTAMP() as updated_at
UNION ALL
SELECT 4, 'new', 400, CURRENT_TIMESTAMP() -- New record
)
SELECT * FROM source_data
Run staging model:
dbt run --select stg_test_source
Verify staging data:
SELECT
*
FROM
your_database.your_schema.stg_test_source
ORDER BY
id;
Expected Result:
ID | STATUS | AMOUNT | UPDATED_AT
1 | inactive | 150 | [new_timestamp]
4 | new | 400 | [new_timestamp]
Run incremental model:
dbt run --select incremental_test
Verify incremental model:
SELECT
*
FROM
your_database.your_schema.incremental_test
ORDER BY
id
Expected Result:
ID | STATUS | AMOUNT | UPDATED_AT
1 | inactive | 150 | [new_timestamp] -- Updated
2 | pending | 200 | [old_timestamp] -- Unchanged
3 | active | 300 | [old_timestamp] -- Unchanged
4 | new | 400 | [new_timestamp] -- New
Step 3: Testing Delete+Insert Strategy
Update incremental model config (keep same staging data):
-- models/incremental_test.sql
{{
config(
materialized='incremental',
incremental_strategy='delete+insert',
unique_key='id'
)
}}
[Rest of the model SQL stays the same]
Run incremental model:
dbt run --select incremental_test
Verify in Snowflake:
SELECT
*
FROM
your_database.your_schema.incremental_test
ORDER BY
id
Expected Result: Similar to merge strategy but with updated timestamps
Step 4: Testing Append Strategy
Update incremental model config:
-- models/incremental_test.sql
{{
config(
materialized='incremental',
incremental_strategy='append'
)
}}
[Rest of the model SQL stays the same]
Run incremental model:
dbt run --full-refresh --select incremental_test
Verify initial state:
SELECT
*
FROM
your_database.your_schema.incremental_test
ORDER BY
id
Update staging data to trigger append:
-- models/staging/stg_test_source.sql
WITH source_data AS (
SELECT 1 as id,
'inactive' as status,
150 as amount,
CURRENT_TIMESTAMP() as updated_at
UNION ALL
SELECT 5, 'newest', 500, CURRENT_TIMESTAMP() -- Another new record
)
SELECT * FROM source_data
Run both models:
dbt run --select stg_test_source
dbt run --select incremental_test
Verify append results:
SELECT
*
FROM
your_database.your_schema.incremental_test
ORDER BY
updated_at
Expected Result: Previous records plus new appended records
Advanced Features
1. Schema Change Management
Handle column additions or removals with the on_schema_change parameter:
Works with clustering for better query optimization
3. Strategy-Specific Configurations
Control column updates in merge operations:
{{
config(
materialized='incremental',
merge_update_columns=['email', 'ip_address'], -- Only update these columns
merge_exclude_columns=['created_at'] -- Never update these columns
)
}}
Data doesn't always arrive in perfect chronological order. Include a buffer period in your incremental logic:
{% if is_incremental() %}
-- Look back 3 days to catch late-arriving data
WHERE updated_at > DATEADD(days, -3, (SELECT MAX(updated_at) FROM {{ this }}))
{% endif %}
2. Optimize Performance
Use appropriate configurations to improve query performance and efficiency: