Incremental Materialization
Incremental models allow you to update only new or modified data in your warehouse instead of rebuilding entire tables. This optimization is particularly valuable when working with:
Large datasets (millions/billions of rows)
Computationally expensive transformations
Time-series data with frequent updates
Basic Configuration
{{
config(
materialized='incremental',
unique_key='id'
)
}}
SELECT
id,
status,
amount,
updated_at
FROM {{ ref('stg_source') }}
{% if is_incremental() %}
-- This filter will only be applied on an incremental run
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}
Key Components
Materialization Config: Set
materialized='incremental'
in your config blockUnique Key: Define what makes each row unique (single column or multiple columns)
Incremental Logic: Use the
is_incremental()
macro to filter for new/changed records
Incremental Strategies
dbt™ supports several strategies for incremental models, each with specific use cases:
Updates existing records and inserts new ones
Tables requiring both inserts and updates
Deletes matching records and reinserts new versions
Batch updates where most records change
Simply adds new records without updating existing ones
Event logs, immutable data
Processes updates in smaller batches
Very large time-series datasets
For detailed examples and configuration options for each strategy, see their dedicated pages.
Advanced Features
1. Schema Change Management
Handle column additions or removals with the on_schema_change
parameter:
{{
config(
materialized='incremental',
on_schema_change='sync_all_columns' -- Options: ignore, fail, append_new_columns, sync_all_columns
)
}}
Options explained:
sync_all_columns
: Automatically adapts to column changes (recommended)fail
: Halts execution when schema changes (useful during development)ignore
: Maintains existing schema (use cautiously)append_new_columns
: Adds new columns without removing old ones
2. Incremental Predicates
Optimize performance for large datasets:
{{
config(
materialized='incremental',
incremental_predicates=[
"DBT_INTERNAL_DEST.session_start > dateadd(day, -7, current_date)"
],
cluster_by=['session_start']
)
}}
This configuration:
Limits the scan of existing data
Improves merge performance
Works with clustering for better query optimization
3. Strategy-Specific Configurations
Control column updates in merge operations:
{{
config(
materialized='incremental',
merge_update_columns=['email', 'ip_address'], -- Only update these columns
merge_exclude_columns=['created_at'] -- Never update these columns
)
}}
4. Custom Strategies
Create your own incremental strategy:
-- macros/my_custom_strategies.sql
{% macro get_incremental_insert_only_sql(arg_dict) %}
{% do return(some_custom_macro_with_sql(
arg_dict["target_relation"],
arg_dict["temp_relation"],
arg_dict["unique_key"],
arg_dict["dest_columns"],
arg_dict["incremental_predicates"]
)) %}
{% endmacro %}
-- models/my_model.sql
{{
config(
materialized='incremental',
incremental_strategy='insert_only'
)
}}
Best Practices
1. Handle Late-Arriving Data
Data doesn't always arrive in perfect chronological order. Include a buffer period in your incremental logic:
{% if is_incremental() %}
-- Look back 3 days to catch late-arriving data
WHERE updated_at > DATEADD(days, -3, (SELECT MAX(updated_at) FROM {{ this }}))
{% endif %}
2. Optimize Performance
Use appropriate configurations to improve query performance and efficiency:
{{
config(
materialized='incremental',
cluster_by=['date_field', 'customer_id'], -- Improve query performance
merge_update_columns=['status', 'amount'] -- Minimize update overhead
)
}}
3. Regular Maintenance
To prevent potential data inconsistencies that might accumulate over time, periodically rebuild your entire table:
dbt run --full-refresh --select model_name
4. Multiple Column Keys
When a single column isn't enough to identify unique records:
{{
config(
materialized='incremental',
unique_key=['customer_id', 'order_date']
)
}}
Last updated
Was this helpful?