Incremental Materialization
Incremental models allow you to update only new or modified data in your warehouse instead of rebuilding entire tables. This optimization is particularly valuable when working with:
Large datasets (millions/billions of rows)
Computationally expensive transformations
Time-series data with frequent updates
Basic Configuration
{{
config(
materialized='incremental',
unique_key='id'
)
}}
SELECT
id,
status,
amount,
updated_at
FROM {{ ref('stg_source') }}
{% if is_incremental() %}
-- This filter will only be applied on an incremental run
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}Key Components
Materialization Config: Set
materialized='incremental'in your config blockUnique Key: Define what makes each row unique (single column or multiple columns)
Incremental Logic: Use the
is_incremental()macro to filter for new/changed records
Incremental Strategies
dbt™ supports several strategies for incremental models, each with specific use cases:
Deletes matching records and reinserts new versions
Batch updates where most records change
For detailed examples and configuration options for each strategy, see their dedicated pages.
Advanced Features
1. Schema Change Management
Handle column additions or removals with the on_schema_change parameter:
Options explained:
sync_all_columns: Automatically adapts to column changes (recommended)fail: Halts execution when schema changes (useful during development)ignore: Maintains existing schema (use cautiously)append_new_columns: Adds new columns without removing old ones
2. Incremental Predicates
Optimize performance for large datasets:
This configuration:
Limits the scan of existing data
Improves merge performance
Works with clustering for better query optimization
3. Strategy-Specific Configurations
Control column updates in merge operations:
4. Custom Strategies
Create your own incremental strategy:
Best Practices
1. Handle Late-Arriving Data
Data doesn't always arrive in perfect chronological order. Include a buffer period in your incremental logic:
2. Optimize Performance
Use appropriate configurations to improve query performance and efficiency:
3. Regular Maintenance
To prevent potential data inconsistencies that might accumulate over time, periodically rebuild your entire table:
4. Multiple Column Keys
When a single column isn't enough to identify unique records:
Last updated
Was this helpful?