Testing Source Freshness

Ensuring that your source data is up to date is critical for reliable analytics and decision-making. dbt™ provides source freshness checks to verify when data was last updated, helping teams monitor pipeline health and ensure data meets expected SLAs (Service Level Agreements).

Why Test Source Freshness?

Source freshness checks help with the following:

  • Detect stale data early – Avoid basing decisions on outdated information.

  • Enforce SLAs – Ensure that critical tables meet data freshness agreements.

  • Improve data quality – Identify pipeline issues before they impact downstream models.

  • Integrate with automated workflows – Flag potential failures in CI/CD pipelines.


Configuring Freshness Checks in sources.yml

To enable source freshness checks, add a freshness block inside the sources.yml file. Below is an example configuration:

Example: Basic Freshness Configuration

version: 2

sources:
  - name: jaffle_shop
    database: raw  
    freshness:
      warn_after: {count: 12, period: hour}  # Warn if data is over 12 hours old
      error_after: {count: 24, period: hour}  # Error if data is over 24 hours old
    loaded_at_field: _etl_loaded_at  # Column storing the last update timestamp
    tables:
      - name: raw_orders
        freshness:  
          warn_after: {count: 6, period: hour}
          error_after: {count: 12, period: hour}

      - name: raw_customers  # Inherits default freshness settings

      - name: raw_product_skus
        freshness: null  # No freshness check for this table

Key Configuration Components

  • freshness block – Defines when dbt should warn or error based on data staleness.

  • warn_after / error_after – Set time thresholds before issuing a warning or error.

  • loaded_at_field – Specifies the column that records when data was last loaded.

  • Table-level overrides – Apply stricter or looser freshness rules for specific tables.

💡 Tip: Use freshness: null to disable freshness checks for a specific table.


Running Freshness Checks in dbt™

Once configured, freshness checks can be executed using the following command:

dbt source freshness  # Checks freshness for all configured sources

This command will:

  1. Query the loaded_at_field for the latest timestamp.

  2. Compare it against the current time to assess data staleness.

  3. Return a success, warning, or error based on the defined thresholds.


Optimizing Freshness Checks for Large Tables

For large datasets, running freshness checks across the entire table can be resource-intensive. To improve efficiency, consider filtering the data to check only recent records:

yamlCopyEdittables:
  - name: large_table
    freshness:
      warn_after: {count: 12, period: hour}
      error_after: {count: 24, period: hour}
    loaded_at_field: _etl_loaded_at
    filter: "_etl_loaded_at >= date_sub(current_date, interval 30 day)"  # Check only last 30 days

Why Use Filtering?

  • Improves performance by checking only recent records.

  • Reduces query costs on large datasets.


Best Practices for Source Freshness Checks

To maximize the effectiveness of freshness testing, follow these best practices:

  • Set Realistic Thresholds – Define warning and error limits that match data ingestion SLAs.

  • Monitor Regularly – Integrate freshness checks into automated workflows.

  • Document Expectations – Clearly define freshness requirements for each source.

  • Use Column-Level Constraints – Combine freshness checks with schema tests (e.g., not_null).

  • Exclude Unnecessary Tables – Disable freshness checks for static or rarely updated sources.


Automating Freshness Monitoring in Paradime

To streamline freshness monitoring, use Paradime’s CLI command to generate or update freshness configurations dynamically:

paradime sources generate

This ensures that your sources.yml remains accurate and up to date without manual edits.

Last updated

Was this helpful?