Project Strucuture

A dbt project is a collection of files that define how raw data should be transformed into analytics-ready datasets. Understanding the structure of a dbt project helps you organize transformations effectively and collaborate with your team.

Anatomy of a dbt Project

When you initialize a new dbt project, you'll see a directory structure like this:

dbt_project/
β”œβ”€β”€ models/          # SQL transformations (core of your project)
β”œβ”€β”€ analyses/        # One-off analytical queries
β”œβ”€β”€ tests/           # Custom data tests
β”œβ”€β”€ macros/          # Reusable SQL code blocks
β”œβ”€β”€ snapshots/       # Historical data tracking definitions
β”œβ”€β”€ seeds/           # CSV files to be loaded into the database
β”œβ”€β”€ dbt_project.yml  # Project configuration
β”œβ”€β”€ packages.yml     # External dependency definitions
└── README.md        # Project documentation

Core Components

Models Directory

The models/ directory contains SQL files that define your transformations. Each SQL file typically becomes a table or view in your data warehouse.

-- models/marts/customers.sql
SELECT
    c.customer_id,
    c.name,
    c.email,
    COUNT(o.order_id) as number_of_orders,
    SUM(o.amount) as total_order_value
FROM {{ ref('stg_customers') }} c
LEFT JOIN {{ ref('stg_orders') }} o ON c.customer_id = o.customer_id
GROUP BY 1, 2, 3

Models are often organized into subdirectories by function or data domain:

Configuration Files

Configuration files define project-wide settings and metadata:

dbt_project.yml

This is the central configuration file for your dbt project. It defines:

  • Project name and version

  • Profile to use for database connections

  • Model materialization settings

  • Directory configurations

packages.yml

This file defines external dbt packages that your project depends on:

Source Definitions

Sources represent the raw data tables in your warehouse. They're defined in YAML files (typically named sources.yml) within the models directory:


Model Organization Patterns

There's no single "right way" to organize your dbt project, but here are common patterns that work well:

Layered Approach

This approach organizes models by their purpose in the transformation pipeline:

Layer
Purpose
Example
Typical Materialization

Staging

Clean and standardize raw data

stg_customers.sql

View

Intermediate

Combine multiple staging models

int_customer_orders.sql

View

Marts

Business-ready tables for analytics

dim_customers.sql

Table

Domain-Based Organization

For larger projects, you might organize by business domain first, then by layer:


Working with Tests

dbt supports two types of tests:

Schema Tests

These are defined in YAML files alongside your models:

Singular Tests

These are custom SQL queries in the tests/ directory that should return zero rows when the test passes:


Real-World Project Organization Example

Here's how a complete e-commerce dbt project might be organized:

Best Practices

  • Be consistent with naming conventions: Use prefixes like stg_, int_, dim_ and fct_ to indicate model purpose

  • Document as you go: Add descriptions in your YAML files for models and columns

  • Start simple: Begin with a staging/marts approach and add complexity as needed

  • Group related models: Keep related transformations close together

  • Limit cross-schema references: Staging should only reference sources, intermediate should only reference staging, etc.

  • Use packages: Don't reinvent common patterns when packages can help

By following a structured approach to organizing your dbt project, you'll create a more maintainable, understandable codebase that enables collaboration and scales with your team.

Last updated

Was this helpful?