Using Append for Incremental Models

The append method is the simplest incremental approach - it just adds new records to your table without updating existing ones. Think of it as adding new entries to a log book where historical entries are never changed. This method is perfect for event logging and similar use cases where historical data remains unchanged.


When to Use Append Method

  • Event logs

  • Immutable data

  • Time-series data without updates

  • When duplicates are acceptable

  • Audit trails

  • High-volume data capture with minimal processing


Advantages and Trade-offs

Advantages
Trade-offs

Fastest performance

No update capability

Simplest execution plan

Can create duplicate data

Minimal resource usage

Requires downstream deduplication if needed

No need for unique key

Not suitable for dimensions or reference data

Scales extremely well

Can lead to table size management challenges

Example Implementation

{{
  config(
    materialized='incremental',
    incremental_strategy='append'
  )
}}

SELECT 
    event_id,
    event_type,
    event_timestamp,
    user_id,
    page_url,
    properties
FROM {{ ref('stg_events') }}

{% if is_incremental() %}
    WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}

In this example:

  • New records are simply appended to the table

  • No updates are made to existing records

  • Note that unique_key is not required


How It Works

When you run an append-only incremental model, dbt:

  1. Creates a temporary table with all the data from your query

  2. Inserts all records from the temporary table into the target table

  3. No deletion or updating of existing records occurs

This is the simplest and fastest approach, making it ideal for high-volume event data.


Step By Step Guide: Delete+Insert Method Implementation

Let's see how the delete+insert method works with a practical example.

Initial Setup

First, follow the same initial setup as described in the "Using Merge for Incremental Models" page:

  1. Create the staging model with test data

  2. Run the staging model

  3. Create the initial incremental model

  4. Run the incremental model

Once you have this setup in place, proceed with testing the delete+insert method:

Testing Append Method

  1. Update incremental model config:

-- models/incremental_test.sql
{{
  config(
    materialized='incremental',
    incremental_strategy='append'
  )
}}

SELECT 
    id,
    status,
    amount,
    updated_at
FROM {{ ref('stg_test_source') }}

{% if is_incremental() %}
    WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}
  1. Run incremental model with full refresh to start clean:

dbt run --full-refresh --select incremental_test
  1. Verify initial state:

SELECT 
    * 
FROM 
    your_database.your_schema.incremental_test 
ORDER BY 
    id
  1. Update staging data to trigger append:

-- models/staging/stg_test_source.sql
WITH source_data AS (
    SELECT 1 as id,
           'inactive' as status,
           150 as amount,
           CURRENT_TIMESTAMP() as updated_at
    UNION ALL
    SELECT 5, 'newest', 500, CURRENT_TIMESTAMP()  -- Another new record
)
SELECT * FROM source_data
  1. Run both models:

dbt run --select stg_test_source
dbt run --select incremental_test
  1. Verify append results:

SELECT 
    * 
FROM 
    your_database.your_schema.incremental_test 
ORDER BY 
    updated_at

Expected Result:

ID | STATUS   | AMOUNT | UPDATED_AT
1  | active   | 100    | [old_timestamp]  -- Original record
2  | pending  | 200    | [old_timestamp]  -- Original record
3  | active   | 300    | [old_timestamp]  -- Original record
1  | inactive | 150    | [new_timestamp]  -- New record (duplicate ID)
5  | newest   | 500    | [new_timestamp]  -- New record

Notice that a new record with ID 1 was added, even though it already exists in the table. This is the key characteristic of append-only: it never updates existing records, only adds new ones.


Handling Duplicates with Append Method

Since append can create duplicate records, you may need strategies to handle this downstream:

  1. Window Functions: Use window functions to identify the most recent version of each record:

WITH ranked_data AS (
  SELECT 
    *,
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) as row_num
  FROM {{ ref('append_only_model') }}
)
SELECT * FROM ranked_data WHERE row_num = 1
  1. Materialized Views: Some warehouses support materialized views that can automatically deduplicate

  2. Downstream Models: Create downstream models that specifically handle deduplication logic


Best Practices for Append Method

  1. Monitor Table Growth - Since records are only added, implement a strategy to manage table size

  2. Consider Partitioning - For very large tables, partitioning by date can improve query performance

  3. Plan for Deduplication - If uniqueness matters for downstream consumers, build deduplication into your models

  4. Use with Time-Series Data - Append-only is ideal for time-series data where historical accuracy is important

The append method offers the best performance for high-volume data ingestion where updates to existing records aren't needed, making it the method of choice for event data, logs, and other immutable datasets.

Last updated

Was this helpful?