Define your sources.yml
The sources.yml
file is a crucial component in dbt projects, used to define and document your raw data sources. It centralizes source information and enables features like freshness checking.
Purpose
The sources.yml
file serves several important functions:
Defines the raw data sources used in your dbt project
Documents metadata about your source tables
Enables source freshness checking
Allows use of the
source()
function in your models
Key Components
Declaring Sources
Sources are defined in .yml
files nested under a sources:
key. Here's an example of how to structure your sources.yml
file:
This example defines two sources: jaffle_shop
and stripe
. For jaffle_shop
, we specify the database and schema. Each source lists its available tables.
Note: By default, schema
will be the same as name
. Add schema
only if you want to use a source name that differs from the existing schema.
Using the source() Function
The source()
function is used in your models to reference the sources you've defined. It takes two arguments: the source name and the table name. Here's an example of how to use it in a model:
This SQL query selects data from the customers
and orders
tables in the jaffle_shop
source, joining them on the customer ID.
The source()
function:
Returns a reference to the specified source table
Establishes dependencies between the source and the current model
Compiles to the full table name in your data warehouse
Best Practices
Use meaningful and consistent naming for your sources and tables
Document your sources thoroughly, including descriptions and owners
Set up appropriate freshness checks for critical data sources
Use column-level constraints and tests to ensure data quality
Keep your
sources.yml
file up-to-date as your data sources evolve
Remember, the sources.yml
file should be version controlled and committed to your repository along with your other dbt project files.
To generate and/or update your sources.yml on the fly, use the Paradime generator CLI command:
Last updated