Using Append for Incremental Models
The append method is the simplest incremental approach - it just adds new records to your table without updating existing ones. Think of it as adding new entries to a log book where historical entries are never changed. This method is perfect for event logging and similar use cases where historical data remains unchanged.
When to Use Append Method
Event logs
Immutable data
Time-series data without updates
When duplicates are acceptable
Audit trails
High-volume data capture with minimal processing
Advantages and Trade-offs
Fastest performance
No update capability
Simplest execution plan
Can create duplicate data
Minimal resource usage
Requires downstream deduplication if needed
No need for unique key
Not suitable for dimensions or reference data
Scales extremely well
Can lead to table size management challenges
Example Implementation
In this example:
New records are simply appended to the table
No updates are made to existing records
Note that
unique_key
is not required
How It Works
When you run an append-only incremental model, dbt:
Creates a temporary table with all the data from your query
Inserts all records from the temporary table into the target table
No deletion or updating of existing records occurs
This is the simplest and fastest approach, making it ideal for high-volume event data.
Step By Step Guide: Delete+Insert Method Implementation
Let's see how the delete+insert method works with a practical example.
Initial Setup
First, follow the same initial setup as described in the "Using Merge for Incremental Models" page:
Create the staging model with test data
Run the staging model
Create the initial incremental model
Run the incremental model
Once you have this setup in place, proceed with testing the delete+insert method:
Testing Append Method
Update incremental model config:
Run incremental model with full refresh to start clean:
Verify initial state:
Update staging data to trigger append:
Run both models:
Verify append results:
Expected Result:
Notice that a new record with ID 1 was added, even though it already exists in the table. This is the key characteristic of append-only: it never updates existing records, only adds new ones.
Handling Duplicates with Append Method
Since append can create duplicate records, you may need strategies to handle this downstream:
Window Functions: Use window functions to identify the most recent version of each record:
Materialized Views: Some warehouses support materialized views that can automatically deduplicate
Downstream Models: Create downstream models that specifically handle deduplication logic
Best Practices for Append Method
Monitor Table Growth - Since records are only added, implement a strategy to manage table size
Consider Partitioning - For very large tables, partitioning by date can improve query performance
Plan for Deduplication - If uniqueness matters for downstream consumers, build deduplication into your models
Use with Time-Series Data - Append-only is ideal for time-series data where historical accuracy is important
The append method offers the best performance for high-volume data ingestion where updates to existing records aren't needed, making it the method of choice for event data, logs, and other immutable datasets.
Last updated
Was this helpful?