Using Microbatch for Incremental Models

The microbatch method processes incremental updates in smaller batches, designed specifically for very large time-series datasets. It's particularly valuable when dealing with datasets that are too large to process in a single operation.


When to Use Microbatch Method

  • Very large time-series datasets

  • When reliability is crucial

  • When processing needs to be more resilient

  • When database resource limits are a concern

  • For tables with billions of rows

  • When single-transaction operations time out


Advantages and Trade-offs

Advantages
Trade-offs

More efficient for large datasets

More complex setup

Better error handling

Requires careful batch size tuning

Improved resilience

Only available on certain platforms

Reduced memory usage

Slightly more overhead than single-batch operations

Can work around query timeouts

Still in beta status

Example Implementation

{{
  config(
    materialized='incremental',
    incremental_strategy='microbatch',
    unique_key='id',
    incremental_predicates=[
      "DBT_INTERNAL_DEST.event_time > dateadd(day, -7, current_date)"
    ]
  )
}}

SELECT 
    id,
    event_type,
    event_time,
    user_id,
    properties
FROM {{ ref('stg_events') }}

{% if is_incremental() %}
    WHERE event_time > (SELECT MAX(event_time) FROM {{ this }})
{% endif %}

In this example:

  • Updates are processed in smaller batches

  • incremental_predicates limits the scope of each operation

  • The overall process is more resilient to timeouts and failures


How It Works

When you run an incremental model with the microbatch method, dbt:

  1. Breaks the update into smaller chunks (batches)

  2. Processes each batch in a separate transaction

  3. Continues with remaining batches even if some fail

  4. Reports overall success or partial failure

This approach is similar to the merge method, but with added reliability for very large datasets.


Configuration Options

The microbatch method has several unique configuration options:

{{
  config(
    materialized='incremental',
    incremental_strategy='microbatch',
    unique_key='id',
    
    -- Microbatch-specific options
    microbatch_size=5000,  -- Number of rows per batch
    microbatch_limit=250,  -- Maximum number of batches
    incremental_predicates=[
      "DBT_INTERNAL_DEST.created_at > dateadd(day, -7, current_date)"
    ]
  )
}}
Option
Description
Default

microbatch_size

Number of rows per batch

10000

microbatch_limit

Maximum number of batches to process

500

incremental_predicates

Conditions to limit the scope of each batch

None


Best Practices for Microbatch Method

  1. Tune Batch Size - Find the optimal batch size that balances performance and reliability

  2. Use Incremental Predicates - Always use predicates to limit the scope of the operation

  3. Monitor Partial Failures - Set up alerting for when some batches fail but others succeed

  4. Consider Recovery Strategies - Have plans for reprocessing failed batches

The microbatch method is still evolving but offers a promising solution for handling very large incremental models where other methods might time out or consume too many resources. It's particularly valuable for high-volume event data or IoT datasets where individual transactions might exceed database limitations.

Last updated

Was this helpful?