Incremental Materialization

Incremental models allow you to update only new or modified data in your warehouse instead of rebuilding entire tables. This optimization is particularly valuable when working with:

  • Large datasets (millions/billions of rows)

  • Computationally expensive transformations

  • Time-series data with frequent updates

Basic Configuration

While these examples use Snowflake syntax, the core concepts apply to most data warehouses. Specific syntax and available features may vary by platform.

{{
  config(
    materialized='incremental',
    unique_key='id'
  )
}}

SELECT 
    id,
    status,
    amount,
    updated_at
FROM {{ ref('stg_source') }}

{% if is_incremental() %}
    -- This filter will only be applied on an incremental run
    WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}

Key Components

  1. Materialization Config: Set materialized='incremental' in your config block

  2. Unique Key: Define what makes each row unique (single column or multiple columns)

  3. Incremental Logic: Use the is_incremental() macro to filter for new/changed records


Incremental Strategies

dbt™ supports several strategies for incremental models, each with specific use cases:

Strategy
Description
Best For

Updates existing records and inserts new ones

Tables requiring both inserts and updates

Deletes matching records and reinserts new versions

Batch updates where most records change

Simply adds new records without updating existing ones

Event logs, immutable data

Processes updates in smaller batches

Very large time-series datasets

For detailed examples and configuration options for each strategy, see their dedicated pages.


Advanced Features

1. Schema Change Management

Handle column additions or removals with the on_schema_change parameter:

{{
  config(
    materialized='incremental',
    on_schema_change='sync_all_columns'  -- Options: ignore, fail, append_new_columns, sync_all_columns
  )
}}

Options explained:

  • sync_all_columns: Automatically adapts to column changes (recommended)

  • fail: Halts execution when schema changes (useful during development)

  • ignore: Maintains existing schema (use cautiously)

  • append_new_columns: Adds new columns without removing old ones

2. Incremental Predicates

Optimize performance for large datasets:

{{
  config(
    materialized='incremental',
    incremental_predicates=[
      "DBT_INTERNAL_DEST.session_start > dateadd(day, -7, current_date)"
    ],
    cluster_by=['session_start']
  )
}}

This configuration:

  • Limits the scan of existing data

  • Improves merge performance

  • Works with clustering for better query optimization

3. Strategy-Specific Configurations

Control column updates in merge operations:

{{
  config(
    materialized='incremental',
    merge_update_columns=['email', 'ip_address'],  -- Only update these columns
    merge_exclude_columns=['created_at']  -- Never update these columns
  )
}}

4. Custom Strategies

Create your own incremental strategy:

-- macros/my_custom_strategies.sql
{% macro get_incremental_insert_only_sql(arg_dict) %}
  {% do return(some_custom_macro_with_sql(
    arg_dict["target_relation"],
    arg_dict["temp_relation"],
    arg_dict["unique_key"],
    arg_dict["dest_columns"],
    arg_dict["incremental_predicates"]
  )) %}
{% endmacro %}

-- models/my_model.sql
{{
  config(
    materialized='incremental',
    incremental_strategy='insert_only'
  )
}}

Best Practices

1. Handle Late-Arriving Data

Data doesn't always arrive in perfect chronological order. Include a buffer period in your incremental logic:

{% if is_incremental() %}
    -- Look back 3 days to catch late-arriving data
    WHERE updated_at > DATEADD(days, -3, (SELECT MAX(updated_at) FROM {{ this }}))
{% endif %}

2. Optimize Performance

Use appropriate configurations to improve query performance and efficiency:

{{
  config(
    materialized='incremental',
    cluster_by=['date_field', 'customer_id'],  -- Improve query performance
    merge_update_columns=['status', 'amount']  -- Minimize update overhead
  )
}}

3. Regular Maintenance

To prevent potential data inconsistencies that might accumulate over time, periodically rebuild your entire table:

dbt run --full-refresh --select model_name

4. Multiple Column Keys

When a single column isn't enough to identify unique records:

{{
  config(
    materialized='incremental',
    unique_key=['customer_id', 'order_date']
  )
}}

Last updated

Was this helpful?