Re-executes the last dbt™ command from the point of failure

This template outlines how to configure Paradime's scheduler to retry failed dbt™ models or tests. By implementing this solution, you can create a resilient data pipeline that allows you to reruns failed models and test from previous executions.

Key Benefits

Ensures your data pipeline continues functioning despite occasional failures
Less time spent checking for and manually rerunning failed models
Quick recovery from transient issues that cause model failures

Prerequisites

Scheduler Environment is connected to your data warehouse provider.

Default Configuration

Schedule Settings

Setting

Value

Explanation

Schedule Type

Deferred

Ensures consistent execution for production workloads in a single environment. Best for regular data pipeline runs

Schedule Name

dbt retry

Descriptive name that indicates purpose

Deferred schedule

hourly run

Specifies the production schedule you want to re-run from last point of failure.

Git Branch

main

Uses your default production branch to ensure you're always running the latest approved code

Last run type (for comparison)

Last Run

Ensure the schedule defers to Last run that completed with errors.

Command Settings

The template uses a single command to re-run models and tests from the last point of failure:

dbt retry: Executes dbt™ models and test from the last point of failures for the deferred schedule name set in the configuration.

This command ensures that all your models and tests that failed in the last run are re-executed.

For custom command configurations, see Command Settings documentation.

Trigger Type

Type: Scheduled Run (Cron)
Cron Schedule: OFF (This schedule will be triggered by a new pull request event, not a cron schedule)

For custom Trigger configurations, see Trigger Types documentation.

Notification Settings

Email Alerts:
- Success: Confirms all models were re-built and tested successfully, letting you know your data pipeline is healthy
- Failure: Immediately alerts you when models fail to build or tests fail, allowing quick response to issues
- SLA Breach: Alerts when runs take longer than the set duration (default: 2 hours), helping identify performance degradation

For custom notification configurations, see Notification Settings documentation.

Use Cases

Primary Use Cases

Handling Intermittent Connection Issues: Automatically retry models that failed due to temporary connection problems with your data warehouse
Recovering from Resource Constraints: Retry models that failed due to memory limitations or query timeouts during peak usage times
Addressing Data Availability Delays: Recover from failures caused by source data not being available at the expected time
Managing Dependencies: Ensure downstream models get rebuilt after their dependencies are successfully retried
Production Recovery: Quickly restore production data pipelines after unexpected failures without manual intervention

PreviousTest Code Changes On Pull Requests NextDeploy Code Changes On Merge

Last updated 3 months ago

Was this helpful?