Project Strucuture
A dbt project is a collection of files that define how raw data should be transformed into analytics-ready datasets. Understanding the structure of a dbt project helps you organize transformations effectively and collaborate with your team.
Anatomy of a dbt Project
When you initialize a new dbt project, you'll see a directory structure like this:
Core Components
Models Directory
The models/
directory contains SQL files that define your transformations. Each SQL file typically becomes a table or view in your data warehouse.
Models are often organized into subdirectories by function or data domain:
Configuration Files
Configuration files define project-wide settings and metadata:
dbt_project.yml
This is the central configuration file for your dbt project. It defines:
Project name and version
Profile to use for database connections
Model materialization settings
Directory configurations
packages.yml
This file defines external dbt packages that your project depends on:
Source Definitions
Sources represent the raw data tables in your warehouse. They're defined in YAML files (typically named sources.yml
) within the models directory:
Model Organization Patterns
There's no single "right way" to organize your dbt project, but here are common patterns that work well:
Layered Approach
This approach organizes models by their purpose in the transformation pipeline:
Staging
Clean and standardize raw data
stg_customers.sql
View
Intermediate
Combine multiple staging models
int_customer_orders.sql
View
Marts
Business-ready tables for analytics
dim_customers.sql
Table
Domain-Based Organization
For larger projects, you might organize by business domain first, then by layer:
Working with Tests
dbt supports two types of tests:
Schema Tests
These are defined in YAML files alongside your models:
Singular Tests
These are custom SQL queries in the tests/
directory that should return zero rows when the test passes:
Real-World Project Organization Example
Here's how a complete e-commerce dbt project might be organized:
Best Practices
Be consistent with naming conventions: Use prefixes like
stg_
,int_
,dim_
andfct_
to indicate model purposeDocument as you go: Add descriptions in your YAML files for models and columns
Start simple: Begin with a staging/marts approach and add complexity as needed
Group related models: Keep related transformations close together
Limit cross-schema references: Staging should only reference sources, intermediate should only reference staging, etc.
Use packages: Don't reinvent common patterns when packages can help
By following a structured approach to organizing your dbt project, you'll create a more maintainable, understandable codebase that enables collaboration and scales with your team.
Last updated
Was this helpful?