Using --defer in Paradime
The --defer
feature in Paradime allows you to leverage production data and schemas during development, significantly speeding up your dbt™ workflow. This guide will walk you through using --defer
, from basic usage to advanced features.
Basic Usage of --defer
With Paradime, you can continuously develop using production data and schema. By enabling defer to prod
in a dbt™ run command, Paradime automatically fetches the latest manifest.json
, ensuring you always work with the most current data and schema.
Prerequisites
A connected Scheduler Connection to your data warehouse
An existing Bolt Schedule
An Enabled "Defer to Production" Schedule
How it works
When using --defer
, dbt™ resolves ref
calls based on two criteria:
Is the referenced node included in the current run's model selection?
Does the reference node exist as a database object in your development environment?
If both answers are No, --defer
resolves the ref()
using the namespace from the state manifest of the specified schedule.
Using --defer in the Terminal
Similarly, You can use --defer
via the Code IDE's integrated terminal:
dbt run --select <model> --defer --schedule-name=<schedule_name>
Example: Deferred vs Standard Run
To illustrate the effect of using --defer
, here's a comparison of compiled SQL:
with orders as (
select * from `dbt-demo-project`.`dbt_prod`.`stg_orders`
),
final as (
select
customer_id,
min(order_date) as first_order,
max(order_date) as most_recent_order,
count(order_id) as number_of_orders
from orders
group by 1
)
select * from final
Notice how the deferred run uses the production schema (dbt_prod
), while the standard run uses the development schema (dbt_fabio
).
Viewing the Deferred Schedule
After running a dbt™ command with --defer
, you can view details of the production run used for deferral in the Integrated Terminal. The output includes a clickable URL to the Bolt UI for more information.
Advanced Use of --defer
Using --favor-state
The --favor-state
flag provides additional control over how dbt™ resolves node references:
dbt run --select customer_orders --defer --favor-state --schedule-name=daily_run
This command tells dbt™ to prefer the state from the 'daily_run' schedule, even if the models exist in your current environment.
The "source_status" Method
The source_status
method allows you to run only models with dependencies on fresher source data. Here's an example configuration:
schedules:
- name: hourly # the name of your schedule
deferred_schedule:
enabled: true # true to enable this schedule to use deferred state
deferred_schedule_name: source_status #the name of the bolt schedule used to to look for the most recent successful run manifest.json / sources.json /run_results.json for state comparison.
successful_run_only: false #[optional]by default paradime will look for the last successful run. Set this to false to let paradime to use defer with last run of the selected schedule, irrespective of the run status.
schedule: "@hourly" # the schedule cron configuration
environment: production #the environment used to run the schedule -> this is always production
commands:
- dbt source freshness # must be run again to compare current to previous state
- dbt build --select source_status:fresher+
owner_email: "[email protected]" #the email of the schedule owner
slack_on: # the configuration of when a notification is triggered. Here we want to send a notification when the run is completed either successfully or when failing
- passed
- failed
slack_notify: # the channel/user that will be notified
- "#data-alerts"
- "@john"
email_notify: # the email addresses that will be notified
- "[email protected]"
- "[email protected]"
email_on: # the configuration of when a notification is triggered. Here we want to send a notification when the run is completed either successfully or when failing
- passed
- failed
This configuration runs dbt source freshness
and then builds only models affected by fresher source data.
Additional schedule parameters available for deferred_schedule
deferred_schedule
When configuring a deferred schedule, you can use the following parameters in the deferred_schedule
section:
enabled
[Required] Set this to TRUE to enable deferral for the schedule
true
deferred_schedule_name
[Required] The name of the Bolt schedule used to look for the most recent successful run artifacts (manifest.json, sources.json, run_results.json) for state comparison. It can be another schedule or the same schedule name (self-deferring).
hourly
successful_run_only
[Optional] By default, Paradime will look for the last successful run. Set this to false
to use the last run of the selected schedule, regardless of its status.
false
Last updated
Was this helpful?