Testing Source Freshness
Ensuring your source data is up-to-date is critical for reliable analytics. dbt's source freshness checks allow you to monitor and verify when data was last loaded, helping your team maintain data quality and meet service level agreements (SLAs).
Why Test Source Freshness?
Source freshness checks serve several important purposes:
Detect stale data
Identify when source data hasn't been updated within expected timeframes
Enforce SLAs
Ensure that data meets freshness requirements for business operations
Prevent incorrect analytics
Avoid building reports on outdated information
Monitor pipeline health
Get early warnings when data ingestion processes fail
How Source Freshness Works
dbt's source freshness checking works by:
- Examining a timestamp column in your source tables 
- Comparing that timestamp to the current time 
- Evaluating the difference against your defined thresholds 
Configuring Freshness Checks
To enable freshness checks, you need to add two key elements to your sources.yml file:
- A - loaded_at_fieldthat identifies the timestamp column
- A - freshnessconfiguration that defines your thresholds
Here's a basic example:
version: 2
sources:
  - name: jaffle_shop
    database: raw  
    freshness:
      warn_after: {count: 12, period: hour}  # Warn if data is over 12 hours old
      error_after: {count: 24, period: hour}  # Error if data is over 24 hours old
    loaded_at_field: _etl_loaded_at  # Column storing the last update timestamp
    tables:
      - name: raw_orders
      - name: raw_customers  # Inherits default freshness settingsUnderstanding the Configuration
- loaded_at_field: The column containing the timestamp when data was loaded
- warn_after: When to issue a warning about stale data
- error_after: When to report an error about stale data (typically more lenient than warnings)
Table-Specific Freshness Settings
You can override source-level freshness settings for specific tables:
sources:
  - name: jaffle_shop
    freshness:
      warn_after: {count: 12, period: hour}
      error_after: {count: 24, period: hour}
    loaded_at_field: _etl_loaded_at
    tables:
      - name: raw_orders
        freshness:  
          warn_after: {count: 6, period: hour}  # More strict for orders table
          error_after: {count: 12, period: hour}
      - name: raw_customers  # Inherits source's default freshness
      - name: raw_product_skus
        freshness: null  # Disable freshness checks for this tableRunning Freshness Checks
To check the freshness of your source data, run:
dbt source freshnessThis command:
- Queries each source table for the most recent - loaded_at_fieldvalue
- Compares that timestamp against the current time 
- Evaluates it against your thresholds 
- Reports success, warning, or error for each source 
Example Output
03:33:31 | Concurrency: 1 threads (target='dev')
03:33:31 | 
03:33:31 | 1 of 2 START freshness of jaffle_shop.raw_orders ................... [RUN]
03:33:32 | 1 of 2 WARN freshness of jaffle_shop.raw_orders .................... [WARN in 0.98s]
03:33:32 | 2 of 2 START freshness of jaffle_shop.raw_customers ................ [RUN]
03:33:33 | 2 of 2 PASS freshness of jaffle_shop.raw_customers ................. [PASS in 0.82s]
03:33:33 | 
03:33:33 | Finished running 2 source freshness checks in 1.99s.Integrating Freshness Checks into Workflows
Freshness checks can be integrated into your data pipeline in several ways:
Scheduled Checks
Run freshness checks on a regular schedule to proactively monitor your data:
# Example cron job to check freshness every hour
0 * * * * cd /path/to/project && dbt source freshnessPre-run Validation
Execute freshness checks before running models to prevent building on stale data:
#!/bin/bash
# Exit with error if any source freshness check fails
dbt source freshness || exit 1
# Only run models if freshness checks pass
dbt runCI/CD Pipeline Integration
Include freshness checks in your CI/CD workflows:
# Example GitHub Actions workflow step
- name: Check source freshness
  run: dbt source freshness --target prodOptimizing Freshness Checks for Large Tables
For very large datasets, running freshness checks can be resource-intensive. You can optimize them using filters:
sources:
  - name: analytics
    tables:
      - name: events
        loaded_at_field: created_at
        freshness:
          warn_after: {count: 1, period: hour}
          error_after: {count: 6, period: hour}
          filter: "date_trunc('day', created_at) >= dateadd('day', -3, current_date)"The filter clause limits the rows that dbt examines when checking freshness, improving performance for large tables.
Handling Different Data Loading Patterns
Different sources may have various data loading patterns that affect how you configure freshness:
Real-time streaming
Short windows (minutes)
warn_after: {count: 15, period: minute}
Hourly batch updates
Medium windows (hours)
warn_after: {count: 2, period: hour}
Daily ETL jobs
Longer windows (days)
warn_after: {count: 1, period: day}
Weekly data delivery
Extended windows
warn_after: {count: 8, period: day}
Troubleshooting Freshness Issues
If your freshness checks are failing, consider these common issues:
- Incorrect timestamp column: Verify that - loaded_at_fieldis the right column
- Timezone differences: Check if there are timezone discrepancies between source timestamps and dbt 
- Data loading failures: Investigate upstream ETL/ELT processes 
- Unrealistic expectations: Adjust thresholds to match actual data loading patterns 
Best Practices
Set realistic thresholds
Align freshness requirements with business needs and actual data load frequencies
Use appropriate column
Choose a column that truly represents when data was last updated (ETL timestamp preferred over source timestamps)
Monitor trends
Track freshness over time to identify deteriorating pipeline performance
Disable when appropriate
Use freshness: null for static reference tables that rarely change
Document expectations
Include freshness SLAs in your data documentation
Automating Source Generation with DinoAI
With DinoAI (Paradime's AI Agent), you can automatically generate source definitions with appropriate freshness configurations. See step by step instructions for more details.
Last updated
Was this helpful?