Testing Source Freshness
Ensuring that your source data is up to date is critical for reliable analytics and decision-making. dbt™ provides source freshness checks to verify when data was last updated, helping teams monitor pipeline health and ensure data meets expected SLAs (Service Level Agreements).
Why Test Source Freshness?
Source freshness checks help with the following:
Detect stale data early – Avoid basing decisions on outdated information.
Enforce SLAs – Ensure that critical tables meet data freshness agreements.
Improve data quality – Identify pipeline issues before they impact downstream models.
Integrate with automated workflows – Flag potential failures in CI/CD pipelines.
Configuring Freshness Checks in sources.yml
sources.yml
To enable source freshness checks, add a freshness
block inside the sources.yml
file. Below is an example configuration:
Example: Basic Freshness Configuration
Key Configuration Components
freshness
block – Defines when dbt should warn or error based on data staleness.warn_after
/error_after
– Set time thresholds before issuing a warning or error.loaded_at_field
– Specifies the column that records when data was last loaded.Table-level overrides – Apply stricter or looser freshness rules for specific tables.
💡 Tip: Use freshness: null
to disable freshness checks for a specific table.
Running Freshness Checks in dbt™
Once configured, freshness checks can be executed using the following command:
This command will:
Query the
loaded_at_field
for the latest timestamp.Compare it against the current time to assess data staleness.
Return a success, warning, or error based on the defined thresholds.
Optimizing Freshness Checks for Large Tables
For large datasets, running freshness checks across the entire table can be resource-intensive. To improve efficiency, consider filtering the data to check only recent records:
Why Use Filtering?
Improves performance by checking only recent records.
Reduces query costs on large datasets.
Best Practices for Source Freshness Checks
To maximize the effectiveness of freshness testing, follow these best practices:
Set Realistic Thresholds – Define warning and error limits that match data ingestion SLAs.
Monitor Regularly – Integrate freshness checks into automated workflows.
Document Expectations – Clearly define freshness requirements for each source.
Use Column-Level Constraints – Combine freshness checks with schema tests (e.g.,
not_null
).Exclude Unnecessary Tables – Disable freshness checks for static or rarely updated sources.
Automating Freshness Monitoring in Paradime
To streamline freshness monitoring, use Paradime’s CLI command to generate or update freshness configurations dynamically:
This ensures that your sources.yml
remains accurate and up to date without manual edits.
Last updated
Was this helpful?