# Incremental Materialization

Incremental models allow you to update only new or modified data in your warehouse instead of rebuilding entire tables. This optimization is particularly valuable when working with:

* Large datasets (millions/billions of rows)
* Computationally expensive transformations
* Time-series data with frequent updates

### Basic Configuration

{% hint style="info" %}
While these examples use Snowflake syntax, the core concepts apply to most data warehouses. Specific syntax and available features may vary by platform.
{% endhint %}

```sql
{{
  config(
    materialized='incremental',
    unique_key='id'
  )
}}

SELECT 
    id,
    status,
    amount,
    updated_at
FROM {{ ref('stg_source') }}

{% if is_incremental() %}
    -- This filter will only be applied on an incremental run
    WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}
```

**Key Components**

1. **Materialization Config**: Set `materialized='incremental'` in your config block
2. **Unique Key**: Define what makes each row unique (single column or multiple columns)
3. **Incremental Logic**: Use the `is_incremental()` macro to filter for new/changed records

***

### Incremental Strategies

dbt™ supports several strategies for incremental models, each with specific use cases:

| Strategy                                                                                                                                                                    | Description                                            | Best For                                  |
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ----------------------------------------- |
| [Merge](https://docs.paradime.io/app-help/concepts/dbt-fundamentals/model-materializations/incremental-materialization/using-merge-for-incremental-models)                  | Updates existing records and inserts new ones          | Tables requiring both inserts and updates |
| [Delete+Insert](https://docs.paradime.io/app-help/concepts/dbt-fundamentals/model-materializations/incremental-materialization/using-delete+insert-for-incremental-models)  | Deletes matching records and reinserts new versions    | Batch updates where most records change   |
| [Append-Only](https://docs.paradime.io/app-help/concepts/dbt-fundamentals/model-materializations/incremental-materialization/using-append-for-incremental-models)           | Simply adds new records without updating existing ones | Event logs, immutable data                |
| [Microbatch (Beta)](https://docs.paradime.io/app-help/concepts/dbt-fundamentals/model-materializations/incremental-materialization/using-microbatch-for-incremental-models) | Processes updates in smaller batches                   | Very large time-series datasets           |

For detailed examples and configuration options for each strategy, see their dedicated pages.

***

### Advanced Features

**1. Schema Change Management**

Handle column additions or removals with the `on_schema_change` parameter:

```sql
{{
  config(
    materialized='incremental',
    on_schema_change='sync_all_columns'  -- Options: ignore, fail, append_new_columns, sync_all_columns
  )
}}
```

Options explained:

* `sync_all_columns`: Automatically adapts to column changes (recommended)
* `fail`: Halts execution when schema changes (useful during development)
* `ignore`: Maintains existing schema (use cautiously)
* `append_new_columns`: Adds new columns without removing old ones

**2. Incremental Predicates**

Optimize performance for large datasets:

```sql
{{
  config(
    materialized='incremental',
    incremental_predicates=[
      "DBT_INTERNAL_DEST.session_start > dateadd(day, -7, current_date)"
    ],
    cluster_by=['session_start']
  )
}}
```

This configuration:

* Limits the scan of existing data
* Improves merge performance
* Works with clustering for better query optimization

**3. Strategy-Specific Configurations**

Control column updates in merge operations:

```sql
{{
  config(
    materialized='incremental',
    merge_update_columns=['email', 'ip_address'],  -- Only update these columns
    merge_exclude_columns=['created_at']  -- Never update these columns
  )
}}
```

**4. Custom Strategies**

Create your own incremental strategy:

```sql
-- macros/my_custom_strategies.sql
{% macro get_incremental_insert_only_sql(arg_dict) %}
  {% do return(some_custom_macro_with_sql(
    arg_dict["target_relation"],
    arg_dict["temp_relation"],
    arg_dict["unique_key"],
    arg_dict["dest_columns"],
    arg_dict["incremental_predicates"]
  )) %}
{% endmacro %}

-- models/my_model.sql
{{
  config(
    materialized='incremental',
    incremental_strategy='insert_only'
  )
}}
```

***

### Best Practices

**1. Handle Late-Arriving Data**

Data doesn't always arrive in perfect chronological order. Include a buffer period in your incremental logic:

```sql
{% if is_incremental() %}
    -- Look back 3 days to catch late-arriving data
    WHERE updated_at > DATEADD(days, -3, (SELECT MAX(updated_at) FROM {{ this }}))
{% endif %}
```

**2. Optimize Performance**

Use appropriate configurations to improve query performance and efficiency:

```sql
{{
  config(
    materialized='incremental',
    cluster_by=['date_field', 'customer_id'],  -- Improve query performance
    merge_update_columns=['status', 'amount']  -- Minimize update overhead
  )
}}
```

**3. Regular Maintenance**

To prevent potential data inconsistencies that might accumulate over time, periodically rebuild your entire table:

```bash
dbt run --full-refresh --select model_name
```

**4. Multiple Column Keys**

When a single column isn't enough to identify unique records:

```sql
{{
  config(
    materialized='incremental',
    unique_key=['customer_id', 'order_date']
  )
}}
```
