Project Strucuture
A dbt project is a collection of files that define how raw data should be transformed into analytics-ready datasets. Understanding the structure of a dbt project helps you organize transformations effectively and collaborate with your team.
Anatomy of a dbt Project
When you initialize a new dbt project, you'll see a directory structure like this:
dbt_project/
βββ models/ # SQL transformations (core of your project)
βββ analyses/ # One-off analytical queries
βββ tests/ # Custom data tests
βββ macros/ # Reusable SQL code blocks
βββ snapshots/ # Historical data tracking definitions
βββ seeds/ # CSV files to be loaded into the database
βββ dbt_project.yml # Project configuration
βββ packages.yml # External dependency definitions
βββ README.md # Project documentationCore Components
Models Directory
The models/ directory contains SQL files that define your transformations. Each SQL file typically becomes a table or view in your data warehouse.
-- models/marts/customers.sql
SELECT
c.customer_id,
c.name,
c.email,
COUNT(o.order_id) as number_of_orders,
SUM(o.amount) as total_order_value
FROM {{ ref('stg_customers') }} c
LEFT JOIN {{ ref('stg_orders') }} o ON c.customer_id = o.customer_id
GROUP BY 1, 2, 3Models are often organized into subdirectories by function or data domain:
Configuration Files
Configuration files define project-wide settings and metadata:
dbt_project.yml
This is the central configuration file for your dbt project. It defines:
Project name and version
Profile to use for database connections
Model materialization settings
Directory configurations
packages.yml
This file defines external dbt packages that your project depends on:
Source Definitions
Sources represent the raw data tables in your warehouse. They're defined in YAML files (typically named sources.yml) within the models directory:
Model Organization Patterns
There's no single "right way" to organize your dbt project, but here are common patterns that work well:
Layered Approach
This approach organizes models by their purpose in the transformation pipeline:
Staging
Clean and standardize raw data
stg_customers.sql
View
Intermediate
Combine multiple staging models
int_customer_orders.sql
View
Marts
Business-ready tables for analytics
dim_customers.sql
Table
Domain-Based Organization
For larger projects, you might organize by business domain first, then by layer:
Working with Tests
dbt supports two types of tests:
Schema Tests
These are defined in YAML files alongside your models:
Singular Tests
These are custom SQL queries in the tests/ directory that should return zero rows when the test passes:
Real-World Project Organization Example
Here's how a complete e-commerce dbt project might be organized:
Best Practices
Be consistent with naming conventions: Use prefixes like
stg_,int_,dim_andfct_to indicate model purposeDocument as you go: Add descriptions in your YAML files for models and columns
Start simple: Begin with a staging/marts approach and add complexity as needed
Group related models: Keep related transformations close together
Limit cross-schema references: Staging should only reference sources, intermediate should only reference staging, etc.
Use packages: Don't reinvent common patterns when packages can help
By following a structured approach to organizing your dbt project, you'll create a more maintainable, understandable codebase that enables collaboration and scales with your team.
Last updated
Was this helpful?