Using Merge for Incremental Models

The merge method uses your database's MERGE statement (or equivalent) to update existing records and insert new ones. This method provides the most control over how your incremental model is updated but requires more computational resources.


When to Use Merge Method

  • Tables requiring both inserts and updates

  • When data consistency is critical

  • When specific columns need updating

  • For maintaining referential integrity


Advantages and Trade-offs

Advantages
Trade-offs

Most flexible method

Slower than other methods

Precise control over column updates

More resource-intensive

Maintains data consistency

Can be costly for very large datasets

Handles complex update patterns

Requires a unique key


Example Implementation

{{
  config(
    materialized='incremental',
    unique_key='event_id',
    incremental_strategy='merge'  // Default, can be omitted
  )
}}

SELECT 
    event_id,
    event_type,
    event_time,
    user_id,
    properties
FROM {{ ref('stg_events') }}

{% if is_incremental() %}
    WHERE event_time > (SELECT MAX(event_time) FROM {{ this }})
{% endif %}

In this example:

  • unique_key identifies which records should be updated

  • The WHERE clause in the is_incremental() block filters for new records

  • Existing records with matching event_id values will be updated

  • New records will be inserted


Fine-Tuning Merge Operations

You can precisely control which columns are updated during merge operations:

This configuration gives you fine-grained control over the merge process.


Step By Step Guide: Merge Method Implementation

Let's see how the merge method works with a practical example.

Step 1: Initial Setup and Testing

First, create the staging model:

Run staging model:

Verify staging data in Snowflake:

Expected Result:

Create the incremental model:

Run incremental model:

The result should match the staging model (3 rows).

Step 2: Testing Merge Method

Update staging model with changes:

Run staging model:

Verify staging data:

Expected Result:

Run incremental model:

Verify incremental model:

Expected Result:

Notice that record with ID 1 was updated with the new status and amount, while new record with ID 4 was inserted. Records with IDs 2 and 3 remained unchanged as they weren't in the staging data.


Common Issues and Solutions

Issue
Solution

Merge performance with large tables

Use incremental_predicates to limit the scope of the merge operation

Unexpected column updates

Use merge_update_columns or merge_exclude_columns for precise control

Duplicate key errors

Ensure your unique_key truly identifies records uniquely

The merge method offers the most flexibility for incremental models, but comes with higher computational costs. Consider it your default choice unless you have specific performance requirements or data characteristics that favor another method.

Last updated

Was this helpful?