Using Append for Incremental Models

The append method is the simplest incremental approach - it just adds new records to your table without updating existing ones. Think of it as adding new entries to a log book where historical entries are never changed. This method is perfect for event logging and similar use cases where historical data remains unchanged.

When to Use Append Method

Event logs
Immutable data
Time-series data without updates
When duplicates are acceptable
Audit trails
High-volume data capture with minimal processing

Advantages and Trade-offs

Advantages

Trade-offs

Fastest performance

No update capability

Simplest execution plan

Can create duplicate data

Minimal resource usage

Requires downstream deduplication if needed

No need for unique key

Not suitable for dimensions or reference data

Scales extremely well

Can lead to table size management challenges

Example Implementation

{{
  config(
    materialized='incremental',
    incremental_strategy='append'
  )
}}

SELECT 
    event_id,
    event_type,
    event_timestamp,
    user_id,
    page_url,
    properties
FROM {{ ref('stg_events') }}

{% if is_incremental() %}
    WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}

In this example:

New records are simply appended to the table
No updates are made to existing records
Note that unique_key is not required

How It Works

When you run an append-only incremental model, dbt:

Creates a temporary table with all the data from your query
Inserts all records from the temporary table into the target table
No deletion or updating of existing records occurs

This is the simplest and fastest approach, making it ideal for high-volume event data.

Step By Step Guide: Delete+Insert Method Implementation

Let's see how the delete+insert method works with a practical example.

Initial Setup

First, follow the same initial setup as described in the "Using Merge for Incremental Models" page:

Create the staging model with test data
Run the staging model
Create the initial incremental model
Run the incremental model

Once you have this setup in place, proceed with testing the delete+insert method:

Testing Append Method

Update incremental model config:

-- models/incremental_test.sql
{{
  config(
    materialized='incremental',
    incremental_strategy='append'
  )
}}

SELECT 
    id,
    status,
    amount,
    updated_at
FROM {{ ref('stg_test_source') }}

{% if is_incremental() %}
    WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}

Run incremental model with full refresh to start clean:

dbt run --full-refresh --select incremental_test

Verify initial state:

SELECT 
    * 
FROM 
    your_database.your_schema.incremental_test 
ORDER BY 
    id

Update staging data to trigger append:

-- models/staging/stg_test_source.sql
WITH source_data AS (
    SELECT 1 as id,
           'inactive' as status,
           150 as amount,
           CURRENT_TIMESTAMP() as updated_at
    UNION ALL
    SELECT 5, 'newest', 500, CURRENT_TIMESTAMP()  -- Another new record
)
SELECT * FROM source_data

Run both models:

dbt run --select stg_test_source
dbt run --select incremental_test

Verify append results:

SELECT 
    * 
FROM 
    your_database.your_schema.incremental_test 
ORDER BY 
    updated_at

Expected Result:

ID | STATUS   | AMOUNT | UPDATED_AT
1  | active   | 100    | [old_timestamp]  -- Original record
2  | pending  | 200    | [old_timestamp]  -- Original record
3  | active   | 300    | [old_timestamp]  -- Original record
1  | inactive | 150    | [new_timestamp]  -- New record (duplicate ID)
5  | newest   | 500    | [new_timestamp]  -- New record

Notice that a new record with ID 1 was added, even though it already exists in the table. This is the key characteristic of append-only: it never updates existing records, only adds new ones.

Handling Duplicates with Append Method

Since append can create duplicate records, you may need strategies to handle this downstream:

Window Functions: Use window functions to identify the most recent version of each record:

WITH ranked_data AS (
  SELECT 
    *,
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) as row_num
  FROM {{ ref('append_only_model') }}
)
SELECT * FROM ranked_data WHERE row_num = 1

Materialized Views: Some warehouses support materialized views that can automatically deduplicate
Downstream Models: Create downstream models that specifically handle deduplication logic

Best Practices for Append Method

Monitor Table Growth - Since records are only added, implement a strategy to manage table size
Consider Partitioning - For very large tables, partitioning by date can improve query performance
Plan for Deduplication - If uniqueness matters for downstream consumers, build deduplication into your models
Use with Time-Series Data - Append-only is ideal for time-series data where historical accuracy is important

The append method offers the best performance for high-volume data ingestion where updates to existing records aren't needed, making it the method of choice for event data, logs, and other immutable datasets.

PreviousUsing Delete+Insert for Incremental Models NextUsing Microbatch for Incremental Models

Last updated 5 months ago

Was this helpful?