# Project Strucuture

A dbt project is a collection of files that define how raw data should be transformed into analytics-ready datasets. Understanding the structure of a dbt project helps you organize transformations effectively and collaborate with your team.

### Anatomy of a dbt Project

When you initialize a new dbt project, you'll see a directory structure like this:

```
dbt_project/
├── models/          # SQL transformations (core of your project)
├── analyses/        # One-off analytical queries
├── tests/           # Custom data tests
├── macros/          # Reusable SQL code blocks
├── snapshots/       # Historical data tracking definitions
├── seeds/           # CSV files to be loaded into the database
├── dbt_project.yml  # Project configuration
├── packages.yml     # External dependency definitions
└── README.md        # Project documentation
```

***

### Core Components

#### Models Directory

The `models/` directory contains SQL files that define your transformations. Each SQL file typically becomes a table or view in your data warehouse.

```sql
-- models/marts/customers.sql
SELECT
    c.customer_id,
    c.name,
    c.email,
    COUNT(o.order_id) as number_of_orders,
    SUM(o.amount) as total_order_value
FROM {{ ref('stg_customers') }} c
LEFT JOIN {{ ref('stg_orders') }} o ON c.customer_id = o.customer_id
GROUP BY 1, 2, 3
```

Models are often organized into subdirectories by function or data domain:

```
models/
├── staging/          # Initial cleaning/renaming of source tables
├── intermediate/     # Intermediate transformations
└── marts/            # Business-level presentation models
```

#### Configuration Files

Configuration files define project-wide settings and metadata:

**dbt\_project.yml**

This is the central configuration file for your dbt project. It defines:

* Project name and version
* Profile to use for database connections
* Model materialization settings
* Directory configurations

```yaml
name: 'ecommerce'
version: '1.0.0'
config-version: 2

profile: 'ecommerce'

model-paths: ["models"]
seed-paths: ["seeds"]
test-paths: ["tests"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

models:
  ecommerce:
    staging:
      +materialized: view
    intermediate:
      +materialized: view
    marts:
      +materialized: table
```

**packages.yml**

This file defines external dbt packages that your project depends on:

```yaml
yamlCopypackages:
  - package: dbt-labs/dbt_utils
    version: 0.8.0
  - package: calogica/dbt_expectations
    version: 0.5.0
```

#### Source Definitions

Sources represent the raw data tables in your warehouse. They're defined in YAML files (typically named `sources.yml`) within the models directory:

```yaml
version: 2

sources:
  - name: raw_data
    schema: raw
    tables:
      - name: customers
        columns:
          - name: id
            tests:
              - unique
              - not_null
      - name: orders
        columns:
          - name: id
            tests:
              - unique
              - not_null
```

***

### Model Organization Patterns

There's no single "right way" to organize your dbt project, but here are common patterns that work well:

#### Layered Approach

This approach organizes models by their purpose in the transformation pipeline:

| Layer        | Purpose                             | Example                   | Typical Materialization |
| ------------ | ----------------------------------- | ------------------------- | ----------------------- |
| Staging      | Clean and standardize raw data      | `stg_customers.sql`       | View                    |
| Intermediate | Combine multiple staging models     | `int_customer_orders.sql` | View                    |
| Marts        | Business-ready tables for analytics | `dim_customers.sql`       | Table                   |

#### Domain-Based Organization

For larger projects, you might organize by business domain first, then by layer:

```
models/
├── marketing/
│   ├── staging/
│   ├── intermediate/
│   └── marts/
├── finance/
│   ├── staging/
│   ├── intermediate/
│   └── marts/
└── product/
    ├── staging/
    ├── intermediate/
    └── marts/
```

***

### Working with Tests

dbt supports two types of tests:

#### Schema Tests

These are defined in YAML files alongside your models:

```yaml
version: 2

models:
  - name: customers
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null
      - name: email
        tests:
          - unique
```

#### Singular Tests

These are custom SQL queries in the `tests/` directory that should return zero rows when the test passes:

```sql
-- tests/orders_with_invalid_customer.sql
SELECT
    o.order_id
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c
    ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL
```

***

### Real-World Project Organization Example

Here's how a complete e-commerce dbt project might be organized:

```
ecommerce_dbt/
├── models/
│   ├── staging/
│   │   ├── stg_customers.sql
│   │   ├── stg_orders.sql
│   │   └── sources.yml
│   ├── intermediate/
│   │   ├── int_order_items.sql
│   │   └── int_customer_orders.sql
│   └── marts/
│       ├── core/
│       │   ├── dim_customers.sql
│       │   ├── dim_products.sql
│       │   ├── fact_orders.sql
│       │   └── schema.yml
│       └── marketing/
│           ├── customer_segmentation.sql
│           └── campaign_performance.sql
├── seeds/
│   ├── country_codes.csv
│   └── product_categories.csv
├── snapshots/
│   └── order_status_history.sql
├── macros/
│   ├── generate_schema_name.sql
│   └── cents_to_dollars.sql
├── tests/
│   └── order_price_check.sql
├── analyses/
│   └── customer_ltv.sql
├── dbt_project.yml
└── packages.yml
```

### Best Practices

* **Be consistent with naming conventions**: Use prefixes like `stg_`, `int_`, `dim_` and `fct_` to indicate model purpose
* **Document as you go**: Add descriptions in your YAML files for models and columns
* **Start simple**: Begin with a staging/marts approach and add complexity as needed
* **Group related models**: Keep related transformations close together
* **Limit cross-schema references**: Staging should only reference sources, intermediate should only reference staging, etc.
* **Use packages**: Don't reinvent common patterns when packages can help

By following a structured approach to organizing your dbt project, you'll create a more maintainable, understandable codebase that enables collaboration and scales with your team.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.paradime.io/app-help/concepts/dbt-fundamentals/getting-started-with-dbt/project-strucuture.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
