Project Strucuture
A dbt project is a collection of files that define how raw data should be transformed into analytics-ready datasets. Understanding the structure of a dbt project helps you organize transformations effectively and collaborate with your team.
Anatomy of a dbt Project
When you initialize a new dbt project, you'll see a directory structure like this:
dbt_project/
├── models/ # SQL transformations (core of your project)
├── analyses/ # One-off analytical queries
├── tests/ # Custom data tests
├── macros/ # Reusable SQL code blocks
├── snapshots/ # Historical data tracking definitions
├── seeds/ # CSV files to be loaded into the database
├── dbt_project.yml # Project configuration
├── packages.yml # External dependency definitions
└── README.md # Project documentation
Core Components
Models Directory
The models/
directory contains SQL files that define your transformations. Each SQL file typically becomes a table or view in your data warehouse.
-- models/marts/customers.sql
SELECT
c.customer_id,
c.name,
c.email,
COUNT(o.order_id) as number_of_orders,
SUM(o.amount) as total_order_value
FROM {{ ref('stg_customers') }} c
LEFT JOIN {{ ref('stg_orders') }} o ON c.customer_id = o.customer_id
GROUP BY 1, 2, 3
Models are often organized into subdirectories by function or data domain:
models/
├── staging/ # Initial cleaning/renaming of source tables
├── intermediate/ # Intermediate transformations
└── marts/ # Business-level presentation models
Configuration Files
Configuration files define project-wide settings and metadata:
dbt_project.yml
This is the central configuration file for your dbt project. It defines:
Project name and version
Profile to use for database connections
Model materialization settings
Directory configurations
name: 'ecommerce'
version: '1.0.0'
config-version: 2
profile: 'ecommerce'
model-paths: ["models"]
seed-paths: ["seeds"]
test-paths: ["tests"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
models:
ecommerce:
staging:
+materialized: view
intermediate:
+materialized: view
marts:
+materialized: table
packages.yml
This file defines external dbt packages that your project depends on:
yamlCopypackages:
- package: dbt-labs/dbt_utils
version: 0.8.0
- package: calogica/dbt_expectations
version: 0.5.0
Source Definitions
Sources represent the raw data tables in your warehouse. They're defined in YAML files (typically named sources.yml
) within the models directory:
version: 2
sources:
- name: raw_data
schema: raw
tables:
- name: customers
columns:
- name: id
tests:
- unique
- not_null
- name: orders
columns:
- name: id
tests:
- unique
- not_null
Model Organization Patterns
There's no single "right way" to organize your dbt project, but here are common patterns that work well:
Layered Approach
This approach organizes models by their purpose in the transformation pipeline:
Staging
Clean and standardize raw data
stg_customers.sql
View
Intermediate
Combine multiple staging models
int_customer_orders.sql
View
Marts
Business-ready tables for analytics
dim_customers.sql
Table
Domain-Based Organization
For larger projects, you might organize by business domain first, then by layer:
models/
├── marketing/
│ ├── staging/
│ ├── intermediate/
│ └── marts/
├── finance/
│ ├── staging/
│ ├── intermediate/
│ └── marts/
└── product/
├── staging/
├── intermediate/
└── marts/
Working with Tests
dbt supports two types of tests:
Schema Tests
These are defined in YAML files alongside your models:
version: 2
models:
- name: customers
columns:
- name: customer_id
tests:
- unique
- not_null
- name: email
tests:
- unique
Singular Tests
These are custom SQL queries in the tests/
directory that should return zero rows when the test passes:
-- tests/orders_with_invalid_customer.sql
SELECT
o.order_id
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c
ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL
Real-World Project Organization Example
Here's how a complete e-commerce dbt project might be organized:
ecommerce_dbt/
├── models/
│ ├── staging/
│ │ ├── stg_customers.sql
│ │ ├── stg_orders.sql
│ │ └── sources.yml
│ ├── intermediate/
│ │ ├── int_order_items.sql
│ │ └── int_customer_orders.sql
│ └── marts/
│ ├── core/
│ │ ├── dim_customers.sql
│ │ ├── dim_products.sql
│ │ ├── fact_orders.sql
│ │ └── schema.yml
│ └── marketing/
│ ├── customer_segmentation.sql
│ └── campaign_performance.sql
├── seeds/
│ ├── country_codes.csv
│ └── product_categories.csv
├── snapshots/
│ └── order_status_history.sql
├── macros/
│ ├── generate_schema_name.sql
│ └── cents_to_dollars.sql
├── tests/
│ └── order_price_check.sql
├── analyses/
│ └── customer_ltv.sql
├── dbt_project.yml
└── packages.yml
Best Practices
Be consistent with naming conventions: Use prefixes like
stg_
,int_
,dim_
andfct_
to indicate model purposeDocument as you go: Add descriptions in your YAML files for models and columns
Start simple: Begin with a staging/marts approach and add complexity as needed
Group related models: Keep related transformations close together
Limit cross-schema references: Staging should only reference sources, intermediate should only reference staging, etc.
Use packages: Don't reinvent common patterns when packages can help
By following a structured approach to organizing your dbt project, you'll create a more maintainable, understandable codebase that enables collaboration and scales with your team.
Last updated
Was this helpful?