Working with Sources
What Are Sources?
In dbt™, sources represent raw data tables from external systems, such as an operational database, CRM, or third-party APIs. Instead of referencing raw tables directly in models, dbt™ allows you to define sources in a centralized file (sources.yml
) for better organization, maintainability, and documentation.
The sources.yml
file is a crucial component in dbt™ projects, centralizing metadata about raw data tables. This ensures consistency, maintainability, and automatic documentation.
Why Use Sources?
Centralizes raw table definitions – Avoids hardcoded table names across multiple models.
Improves maintainability – If raw table locations change, you only need to update
sources.yml
.Enables freshness checks – dbt™ can monitor source data latency.
Enhances documentation – Automatically generates lineage graphs and model dependencies.
Defining Sources in sources.yml
sources.yml
Sources are defined in .yml
files under the sources:
key. Below is an example:
Breaking it Down:
name:
– The logical name of the source.database:
– The database where the source is located (optional, required only if different from your target database).schema:
– The schema where the source tables exist.tables:
– Lists the tables under this source.
By default, schema
is assumed to be the same as name
unless explicitly overridden.
Using the source()
Function in Models
source()
Function in ModelsOnce sources are defined in sources.yml
, you can reference them using the source()
function in your dbt™ models.
Example: Querying a Source Table
How source()
Works:
source()
Works:Ensures consistency – Avoids hardcoded schema and table names.
Resolves table references dynamically – Updates table locations without breaking queries.
Tracks dependencies – Establishes relationships between raw data and dbt™ models.
Checking Source Freshness
dbt™ allows you to monitor how fresh your raw data is by running source freshness checks. This helps track latency in data ingestion pipelines.
Example: Defining Freshness Expectations
Running a Freshness Check
If the data is stale, dbt™ will flag a warning or an error based on the thresholds.
Best Practice: Set up freshness checks on critical sources to monitor pipeline delays.
Generating & Updating sources.yml
Automatically
sources.yml
AutomaticallyInstead of manually defining sources, use the Paradime CLI to generate sources.yml
dynamically:
Last updated
Was this helpful?