Using Microbatch for Incremental Models

The microbatch method processes incremental updates in smaller batches, designed specifically for very large time-series datasets. It's particularly valuable when dealing with datasets that are too large to process in a single operation.

The microbatch method is currently in Beta. Features and syntax may change in future releases.

When to Use Microbatch Method

Very large time-series datasets
When reliability is crucial
When processing needs to be more resilient
When database resource limits are a concern
For tables with billions of rows
When single-transaction operations time out

Advantages and Trade-offs

Advantages

Trade-offs

More efficient for large datasets

More complex setup

Better error handling

Requires careful batch size tuning

Improved resilience

Only available on certain platforms

Reduced memory usage

Slightly more overhead than single-batch operations

Can work around query timeouts

Still in beta status

Example Implementation

{{
  config(
    materialized='incremental',
    incremental_strategy='microbatch',
    unique_key='id',
    incremental_predicates=[
      "DBT_INTERNAL_DEST.event_time > dateadd(day, -7, current_date)"
    ]
  )
}}

SELECT 
    id,
    event_type,
    event_time,
    user_id,
    properties
FROM {{ ref('stg_events') }}

{% if is_incremental() %}
    WHERE event_time > (SELECT MAX(event_time) FROM {{ this }})
{% endif %}

In this example:

Updates are processed in smaller batches
incremental_predicates limits the scope of each operation
The overall process is more resilient to timeouts and failures

How It Works

When you run an incremental model with the microbatch method, dbt:

Breaks the update into smaller chunks (batches)
Processes each batch in a separate transaction
Continues with remaining batches even if some fail
Reports overall success or partial failure

This approach is similar to the merge method, but with added reliability for very large datasets.

Configuration Options

The microbatch method has several unique configuration options:

{{
  config(
    materialized='incremental',
    incremental_strategy='microbatch',
    unique_key='id',
    
    -- Microbatch-specific options
    microbatch_size=5000,  -- Number of rows per batch
    microbatch_limit=250,  -- Maximum number of batches
    incremental_predicates=[
      "DBT_INTERNAL_DEST.created_at > dateadd(day, -7, current_date)"
    ]
  )
}}

Option

Description

Default

microbatch_size

Number of rows per batch

10000

microbatch_limit

Maximum number of batches to process

500

incremental_predicates

Conditions to limit the scope of each batch

None

Best Practices for Microbatch Method

Tune Batch Size - Find the optimal batch size that balances performance and reliability
Use Incremental Predicates - Always use predicates to limit the scope of the operation
Monitor Partial Failures - Set up alerting for when some batches fail but others succeed
Consider Recovery Strategies - Have plans for reprocessing failed batches

The microbatch method is still evolving but offers a promising solution for handling very large incremental models where other methods might time out or consume too many resources. It's particularly valuable for high-volume event data or IoT datasets where individual transactions might exceed database limitations.

PreviousUsing Append for Incremental Models NextEphemeral Materialization

Last updated 8 months ago

Was this helpful?