Project Strucuture

A dbt project is a collection of files that define how raw data should be transformed into analytics-ready datasets. Understanding the structure of a dbt project helps you organize transformations effectively and collaborate with your team.

Anatomy of a dbt Project

When you initialize a new dbt project, you'll see a directory structure like this:

dbt_project/
├── models/          # SQL transformations (core of your project)
├── analyses/        # One-off analytical queries
├── tests/           # Custom data tests
├── macros/          # Reusable SQL code blocks
├── snapshots/       # Historical data tracking definitions
├── seeds/           # CSV files to be loaded into the database
├── dbt_project.yml  # Project configuration
├── packages.yml     # External dependency definitions
└── README.md        # Project documentation

Core Components

Models Directory

The models/ directory contains SQL files that define your transformations. Each SQL file typically becomes a table or view in your data warehouse.

-- models/marts/customers.sql
SELECT
    c.customer_id,
    c.name,
    c.email,
    COUNT(o.order_id) as number_of_orders,
    SUM(o.amount) as total_order_value
FROM {{ ref('stg_customers') }} c
LEFT JOIN {{ ref('stg_orders') }} o ON c.customer_id = o.customer_id
GROUP BY 1, 2, 3

Models are often organized into subdirectories by function or data domain:

models/
├── staging/          # Initial cleaning/renaming of source tables
├── intermediate/     # Intermediate transformations
└── marts/            # Business-level presentation models

Configuration Files

Configuration files define project-wide settings and metadata:

dbt_project.yml

This is the central configuration file for your dbt project. It defines:

Project name and version
Profile to use for database connections
Model materialization settings
Directory configurations

name: 'ecommerce'
version: '1.0.0'
config-version: 2

profile: 'ecommerce'

model-paths: ["models"]
seed-paths: ["seeds"]
test-paths: ["tests"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

models:
  ecommerce:
    staging:
      +materialized: view
    intermediate:
      +materialized: view
    marts:
      +materialized: table

packages.yml

This file defines external dbt packages that your project depends on:

yamlCopypackages:
  - package: dbt-labs/dbt_utils
    version: 0.8.0
  - package: calogica/dbt_expectations
    version: 0.5.0

Source Definitions

Sources represent the raw data tables in your warehouse. They're defined in YAML files (typically named sources.yml) within the models directory:

version: 2

sources:
  - name: raw_data
    schema: raw
    tables:
      - name: customers
        columns:
          - name: id
            tests:
              - unique
              - not_null
      - name: orders
        columns:
          - name: id
            tests:
              - unique
              - not_null

Model Organization Patterns

There's no single "right way" to organize your dbt project, but here are common patterns that work well:

Layered Approach

This approach organizes models by their purpose in the transformation pipeline:

Layer

Purpose

Example

Typical Materialization

Staging

Clean and standardize raw data

stg_customers.sql

View

Intermediate

Combine multiple staging models

int_customer_orders.sql

View

Marts

Business-ready tables for analytics

dim_customers.sql

Table

Domain-Based Organization

For larger projects, you might organize by business domain first, then by layer:

models/
├── marketing/
│   ├── staging/
│   ├── intermediate/
│   └── marts/
├── finance/
│   ├── staging/
│   ├── intermediate/
│   └── marts/
└── product/
    ├── staging/
    ├── intermediate/
    └── marts/

Working with Tests

dbt supports two types of tests:

Schema Tests

These are defined in YAML files alongside your models:

version: 2

models:
  - name: customers
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null
      - name: email
        tests:
          - unique

Singular Tests

These are custom SQL queries in the tests/ directory that should return zero rows when the test passes:

-- tests/orders_with_invalid_customer.sql
SELECT
    o.order_id
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c
    ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL

Real-World Project Organization Example

Here's how a complete e-commerce dbt project might be organized:

ecommerce_dbt/
├── models/
│   ├── staging/
│   │   ├── stg_customers.sql
│   │   ├── stg_orders.sql
│   │   └── sources.yml
│   ├── intermediate/
│   │   ├── int_order_items.sql
│   │   └── int_customer_orders.sql
│   └── marts/
│       ├── core/
│       │   ├── dim_customers.sql
│       │   ├── dim_products.sql
│       │   ├── fact_orders.sql
│       │   └── schema.yml
│       └── marketing/
│           ├── customer_segmentation.sql
│           └── campaign_performance.sql
├── seeds/
│   ├── country_codes.csv
│   └── product_categories.csv
├── snapshots/
│   └── order_status_history.sql
├── macros/
│   ├── generate_schema_name.sql
│   └── cents_to_dollars.sql
├── tests/
│   └── order_price_check.sql
├── analyses/
│   └── customer_ltv.sql
├── dbt_project.yml
└── packages.yml

Best Practices

Be consistent with naming conventions: Use prefixes like stg_, int_, dim_ and fct_ to indicate model purpose
Document as you go: Add descriptions in your YAML files for models and columns
Start simple: Begin with a staging/marts approach and add complexity as needed
Group related models: Keep related transformations close together
Limit cross-schema references: Staging should only reference sources, intermediate should only reference staging, etc.
Use packages: Don't reinvent common patterns when packages can help

By following a structured approach to organizing your dbt project, you'll create a more maintainable, understandable codebase that enables collaboration and scales with your team.

PreviousIntroduction NextWorking with Sources

Last updated 8 months ago

Was this helpful?