Project Strucuture
A dbt™ project is the foundation of how you organize, develop, and maintain your data transformations. Think of it as a collection of files that define how your raw data should be transformed into analytics-ready datasets.
Core Components
Every dbt™ project consists of several key components:
1. Project Files
Models – SQL files that define your transformations.
Tests – YAML files that check data quality.
Documentation – YAML files describing your project.
Macros – Reusable code snippets for automation and logic.
2. Configuration Files
dbt_project.yml – Primary project configuration.
sources.yml – Defines raw data sources.
properties.yml – Documents models, tests, and metadata.
packages.yml – Manages external package dependencies.
3. Project Layout
A standard dbt™ project structure looks like this:
Project Configuration (dbt_project.yml)
The dbt_project.yml file is the heart of your dbt™ project. It:
Identifies your dbt™ project
Sets project-wide settings
Defines how models should be materialized
Controls project behavior
Basic Structure
This file defines how dbt interacts with your project, where it looks for files, and how models are materialized.
Source Definitions (sources.yml
)
sources.yml
)Sources represent raw data warehouse tables. Defining them in YAML allows dbt to:
Track dependencies between raw data and models.
Monitor data freshness.
Document the origin of raw data.
Example: Defining Sources
Schema and Metadata Definitions (properties.yml
)
properties.yml
)The properties.yml
file (also known as schema.yml
) is where you document models and define tests at the column level.
Example: Model Documentation and Testing
This ensures data integrity and provides clear documentation.
Package Management (packages.yml
)
packages.yml
)Packages in dbt™ function like libraries in programming languages, allowing teams to extend dbt’s functionality.
Why Use Packages?
Pre-built Transformations – Utilize pre-tested transformations instead of writing SQL from scratch.
Code Reusability – Share logic across different dbt projects.
Community Best Practices – Leverage solutions built and maintained by the dbt™ community.
Example: Defining Packages
These packages simplify integrating third-party data sources like Google Ads, Facebook Ads, and Salesforce.
Project Organization: Layered Approach
To keep dbt™ projects maintainable, transformations are structured in layers:
Layer
Purpose
Example
Staging
Cleans and renames raw data
stg_orders.sql
Intermediate
Combines multiple staging models
int_order_details.sql
Marts
Business-ready tables for analytics
dim_customers.sql
, fact_sales.sql
Benefits of Layered Organization
Clear Data Lineage – Easily trace transformations.
Modular Transformations – Modify one layer without affecting others.
Efficient Troubleshooting – Isolate issues quickly.
Scalability – Teams can work on different layers independently.
How Everything Works Together
A dbt™ project follows a structured workflow:
Sources define where raw data comes from.
Models transform that data into usable tables and views.
Tests ensure data integrity before analytics teams use it.
Materialization determines how transformed data is stored.
Version Control tracks all changes for collaboration.
By structuring your dbt™ project effectively, you can build scalable, maintainable, and high-performance data pipelines.
Last updated
Was this helpful?