Variables and Parameters

Variables allow you to make your dbt project more dynamic and configurable by passing values at runtime or setting them in configuration files. They enable you to create flexible data transformations that can adapt to different environments, use cases, and scenarios.

Understanding dbt Variables

Variables in dbt serve two primary purposes:

  1. Make code reusable - Define values once and reference them throughout your project

  2. Enable flexibility - Change behavior without modifying code

There are several ways to define and use variables in dbt:

  • Project variables - Defined in dbt_project.yml

  • Command-line variables - Passed at runtime

  • Environment variables - Accessed via Jinja macros


Defining Variables in dbt_project.yml

The simplest way to define variables is in your dbt_project.yml file:

vars:
  # Simple scalar values
  start_date: '2020-01-01'
  end_date: '2022-12-31'
  
  # Lists
  excluded_countries: ['test', 'demo', 'internal']
  
  # Dictionaries
  partner_sales_targets: {
    'tier1': 1000000,
    'tier2': 500000,
    'tier3': 100000
  }
  
  # Environment-specific variables
  dev:
    row_limit: 100
    debug_mode: true
  prod:
    row_limit: null
    debug_mode: false

These variables become available throughout your project via the var() function.


Using Variables in Models

Once defined, you can reference variables in your models using the var() function:

-- models/reporting/monthly_sales.sql
SELECT
  date_trunc('month', order_date) as month,
  SUM(amount) as monthly_sales
FROM {{ ref('stg_orders') }}
WHERE 
  order_date >= '{{ var("start_date") }}'
  AND order_date <= '{{ var("end_date") }}'
  {% if var('row_limit') %}
  LIMIT {{ var('row_limit') }}
  {% endif %}

The var() function has two parameters:

  1. The variable name

  2. An optional default value that's used if the variable isn't defined

sqlCopy-- Using a default value
SELECT * FROM {{ ref('stg_users') }}
WHERE status = '{{ var("user_status", "active") }}'

Variable Behavior

When you use the var() function:

  • It will use the variable from dbt_project.yml if defined

  • Command-line variables override values from dbt_project.yml

  • If no variable is found and no default is specified, dbt will raise an error

  • Environment-specific variables (dev, prod) are only used when running in that environment


Passing Variables at Runtime

For maximum flexibility, pass variables at runtime using the --vars flag:

dbt run --vars '{"start_date": "2023-01-01", "end_date": "2023-03-31"}'

You can pass complex structures too:

dbt run --vars '{"regions": ["north", "south"], "include_test_data": false}'

Runtime variables override any variables defined in dbt_project.yml.


Working with Environment Variables

You can access environment variables using the env_var Jinja function:

-- Configuring a model to use environment variables
{{ 
  config(
    schema=env_var('DBT_SCHEMA', 'analytics')
  ) 
}}

SELECT * FROM {{ ref('stg_orders') }}

This is particularly useful for sensitive information (like API keys) or values that vary by environment.

Security Note

Never use env_var() for credentials that should remain secret. These values could be exposed in compiled SQL or logs. Instead, use your platform's secure environment variable handling for credentials.


Advanced Variable Techniques

Conditional Logic with Variables

Variables allow you to implement conditional logic in your models:

{% if var('data_source') == 'api' %}
  SELECT * FROM {{ ref('stg_api_data') }}
{% else %}
  SELECT * FROM {{ ref('stg_warehouse_data') }}
{% endif %}

Dynamic Filtering

Create flexible filtering based on variable values:

SELECT
  *
FROM {{ ref('stg_transactions') }}
WHERE 1=1
  {% if var('filter_by_date', false) %}
  AND transaction_date BETWEEN '{{ var("start_date") }}' AND '{{ var("end_date") }}'
  {% endif %}
  
  {% if var('filter_by_country', false) %}
  AND country IN (
    {% for country in var('countries', []) %}
      '{{ country }}'{% if not loop.last %},{% endif %}
    {% endfor %}
  )
  {% endif %}

Date/Time Variables

A common pattern for incremental models is using variables for date ranges:

{% set run_date = var('run_date', modules.datetime.date.today().strftime('%Y-%m-%d')) %}

SELECT 
  *
FROM {{ source('events', 'daily_events') }}
WHERE 
  event_date = '{{ run_date }}'

Best Practices for Variables

Practice
Description

Set meaningful defaults

Provide sensible default values to make your code more robust

Use descriptive names

Choose clear, explicit variable names that explain purpose

Document variables

Add comments in dbt_project.yml to explain each variable's purpose

Consistent formatting

Maintain consistent casing and naming conventions

Avoid hardcoding

Use variables instead of hardcoding values that might change

Example: Well-Structured Variables

vars:
  # Analysis date range - used for filtering transaction data
  # Format: YYYY-MM-DD
  analysis_start_date: '2023-01-01'  # Inclusive
  analysis_end_date: '2023-12-31'    # Inclusive
  
  # Revenue recognition settings
  rev_rec_delay_days: 14             # Days to delay revenue recognition
  include_refunds: false             # Set to true to include refunded transactions
  
  # Environment-specific settings
  dev:
    debug_mode: true                 # Enables additional logging
    data_sample_pct: 10              # Only process 10% of data in dev
  prod:
    debug_mode: false
    data_sample_pct: 100             # Process all data in prod

Common Use Cases

Environment-Specific Configuration

Define different behavior based on your deployment environment:

# dbt_project.yml
vars:
  dev:
    schema_prefix: 'dev_'
    row_limit: 1000
  prod:
    schema_prefix: ''
    row_limit: null
-- models/model.sql
{{ 
  config(
    schema=var('schema_prefix', 'dev_') ~ 'marketing'
  ) 
}}

SELECT * FROM {{ ref('stg_data') }}
{% if var('row_limit') %}
LIMIT {{ var('row_limit') }}
{% endif %}

Parameterized Reporting

Create reports with customizable parameters:

-- models/daily_sales_report.sql
{% set date_column = var('date_column', 'order_date') %}
{% set granularity = var('granularity', 'day') %}

SELECT
  DATE_TRUNC('{{ granularity }}', {{ date_column }}) as period,
  SUM(amount) as sales
FROM {{ ref('fct_orders') }}
GROUP BY 1
ORDER BY 1

Then run with different settings:

dbt run --select daily_sales_report --vars '{"granularity": "month", "date_column": "shipped_date"}'

By effectively using variables in your dbt project, you create more flexible, maintainable, and reusable data transformations that can easily adapt to different needs and environments without code changes.

Last updated

Was this helpful?