Paradime Help Docs
Get Started
  • 🚀Introduction
  • 📃Guides
    • Paradime 101
      • Getting Started with your Paradime Workspace
        • Creating a Workspace
        • Setting Up Data Warehouse Connections
        • Managing workspace configurations
        • Managing Users in the Workspace
      • Getting Started with the Paradime IDE
        • Setting Up a dbt™ Project
        • Creating a dbt™ Model
        • Data Exploration in the Code IDE
        • DinoAI: Accelerating Your Analytics Engineering Workflow
          • DinoAI Agent
            • Creating dbt Sources from Data Warehouse
            • Generating Base Models
            • Building Intermediate/Marts Models
            • Documentation Generation
            • Data Pipeline Configuration
            • Using .dinorules to Tailor Your AI Experience
          • Accelerating GitOps
          • Accelerating Data Governance
          • Accelerating dbt™ Development
        • Utilizing Advanced Developer Features
          • Visualize Data Lineage
          • Auto-generated Data Documentation
          • Enforce SQL and YAML Best Practices
          • Working with CSV Files
      • Managing dbt™ Schedules with Bolt
        • Creating Bolt Schedules
        • Understanding schedule types and triggers
        • Viewing Run History and Analytics
        • Setting Up Notifications
        • Debugging Failed Runs
    • Migrating from dbt™ cloud to Paradime
  • 🔍Concepts
    • Working with Git
      • Git Lite
      • Git Advanced
      • Read Only Branches
      • Delete Branches
      • Merge Conflicts
      • Configuring Signed Commits on Paradime with SSH Keys
    • dbt™ fundamentals
      • Getting started with dbt™
        • Introduction
        • Project Strucuture
        • Working with Sources
        • Testing Data Quality
        • Models and Transformations
      • Configuring your dbt™ Project
        • Setting up your dbt_project.yml
        • Defining Your Sources in sources.yml
        • Testing Source Freshness
        • Unit Testing
        • Working with Tags
        • Managing Seeds
        • Environment Management
        • Variables and Parameters
        • Macros
        • Custom Tests
        • Hooks & Operational Tasks
        • Packages
      • Model Materializations
        • Table Materialization
        • View​ Materialization
        • Incremental Materialization
          • Using Merge for Incremental Models
          • Using Delete+Insert for Incremental Models
          • Using Append for Incremental Models
          • Using Microbatch for Incremental Models
        • Ephemeral Materialization
        • Snapshots
      • Running dbt™
        • Mastering the dbt™ CLI
          • Commands
          • Methods
          • Selector Methods
          • Graph Operators
    • Paradime fundamentals
      • Global Search
        • Paradime Apps Navigation
        • Invite users to your workspace
        • Search and preview Bolt schedules status
      • Using --defer in Paradime
      • Workspaces and data mesh
    • Data Warehouse essentials
      • BigQuery Multi-Project Service Account
  • 📖Documentation
    • DinoAI
      • Agent Mode
        • Use Cases
          • Creating Sources from your Warehouse
          • Generating dbt™ models
          • Fixing Errors with Jira
          • Researching with Perplexity
          • Providing Additional Context Using PDFs
      • Context
        • File Context
        • Directory Context
      • Tools and Features
        • Warehouse Tool
        • File System Tool
        • PDF Tool
        • Jira Tool
        • Perplexity Tool
        • Terminal Tool
        • Coming Soon Tools...
      • .dinorules
      • Ask Mode
      • Version Control
      • Production Pipelines
      • Data Documentation
    • Code IDE
      • User interface
        • Autocompletion
        • Context Menu
        • Flexible layout
        • "Peek" and "Go To" Definition
        • IDE preferences
        • Shortcuts
      • Left Panel
        • DinoAI Coplot
        • Search, Find, and Replace
        • Git Lite
        • Bookmarks
      • Command Panel
        • Data Explorer
        • Lineage
        • Catalog
        • Lint
      • Terminal
        • Running dbt™
        • Paradime CLI
      • Additional Features
        • Scratchpad
    • Bolt
      • Creating Schedules
        • 1. Schedule Settings
        • 2. Command Settings
          • dbt™ Commands
          • Python Scripts
          • Elementary Commands
          • Lightdash Commands
          • Tableau Workbook Refresh
          • Power BI Dataset Refresh
          • Paradime Bolt Schedule Toggle Commands
          • Monte Carlo Commands
        • 3. Trigger Types
        • 4. Notification Settings
        • Templates
          • Run and Test all your dbt™ Models
          • Snapshot Source Data Freshness
          • Build and Test Models with New Source Data
          • Test Code Changes On Pull Requests
          • Re-executes the last dbt™ command from the point of failure
          • Deploy Code Changes On Merge
          • Create Jira Tickets
          • Trigger Census Syncs
          • Trigger Hex Projects
          • Create Linear Issues
          • Create New Relic Incidents
          • Create Azure DevOps Items
        • Schedules as Code
      • Managing Schedules
        • Schedule Configurations
        • Viewing Run Log History
        • Analyzing Individual Run Details
          • Configuring Source Freshness
      • Bolt API
      • Special Environment Variables
        • Audit environment variables
        • Runtime environment variables
      • Integrations
        • Reverse ETL
          • Hightouch
        • Orchestration
          • Airflow
          • Azure Data Factory (ADF)
      • CI/CD
        • Turbo CI
          • Azure DevOps
          • BitBucket
          • GitHub
          • GitLab
          • Paradime Turbo CI Schema Cleanup
        • Continuous Deployment with Bolt
          • GitHub Native Continuous Deployment
          • Using Azure Pipelines
          • Using BitBucket Pipelines
          • Using GitLab Pipelines
        • Column-Level Lineage Diff
          • dbt™ mesh
          • Looker
          • Tableau
          • Thoughtspot
    • Radar
      • Get Started
      • Cost Management
        • Snowflake Cost Optimization
        • Snowflake Cost Monitoring
        • BigQuery Cost Monitoring
      • dbt™ Monitoring
        • Schedules Dashboard
        • Models Dashboard
        • Sources Dashboard
        • Tests Dashboard
      • Team Efficiency Tracking
      • Real-time Alerting
      • Looker Monitoring
    • Data Catalog
      • Data Assets
        • Looker assets
        • Tableau assets
        • Power BI assets
        • Sigma assets
        • ThoughtSpot assets
        • Fivetran assets
        • dbt™️ assets
      • Lineage
        • Search and Discovery
        • Filters and Nodes interaction
        • Nodes navigation
        • Canvas interactions
        • Compare Lineage version
    • Integrations
      • Dashboards
        • Sigma
        • ThoughtSpot (Beta)
        • Lightdash
        • Tableau
        • Looker
        • Power BI
        • Streamlit
      • Code IDE
        • Cube CLI
        • dbt™️ generator
        • Prettier
        • Harlequin
        • SQLFluff
        • Rainbow CSV
        • Mermaid
          • Architecture Diagrams
          • Block Diagrams Documentation
          • Class Diagrams
          • Entity Relationship Diagrams
          • Gantt Diagrams
          • GitGraph Diagrams
          • Mindmaps
          • Pie Chart Diagrams
          • Quadrant Charts
          • Requirement Diagrams
          • Sankey Diagrams
          • Sequence Diagrams
          • State Diagrams
          • Timeline Diagrams
          • User Journey Diagrams
          • XY Chart
          • ZenUML
        • pre-commit
          • Paradime Setup and Configuration
          • dbt™️-checkpoint hooks
            • dbt™️ Model checks
            • dbt™️ Script checks
            • dbt™️ Source checks
            • dbt™️ Macro checks
            • dbt™️ Modifiers
            • dbt™️ commands
            • dbt™️ checks
          • SQLFluff hooks
          • Prettier hooks
      • Observability
        • Elementary Data
          • Anomaly Detection Tests
            • Anomaly tests parameters
            • Volume anomalies
            • Freshness anomalies
            • Event freshness anomalies
            • Dimension anomalies
            • All columns anomalies
            • Column anomalies
          • Schema Tests
            • Schema changes
            • Schema changes from baseline
          • Sending alerts
            • Slack alerts
            • Microsoft Teams alerts
            • Alerts Configuration and Customization
          • Generate observability report
          • CLI commands and usage
        • Monte Carlo
      • Storage
        • Amazon S3
        • Snowflake Storage
      • Reverse ETL
        • Hightouch
      • CI/CD
        • GitHub
        • Spectacles
      • Notifications
        • Microsoft Teams
        • Slack
      • ETL
        • Fivetran
    • Security
      • Single Sign On (SSO)
        • Okta SSO
        • Azure AD SSO
        • Google SAML SSO
        • Google Workspace SSO
        • JumpCloud SSO
      • Audit Logs
      • Security model
      • Privacy model
      • FAQs
      • Trust Center
      • Security
    • Settings
      • Workspaces
      • Git Repositories
        • Importing a repository
          • Azure DevOps
          • BitBucket
          • GitHub
          • GitLab
        • Update connected git repository
      • Connections
        • Code IDE environment
          • Amazon Athena
          • BigQuery
          • Clickhouse
          • Databricks
          • Dremio
          • DuckDB
          • Firebolt
          • Microsoft Fabric
          • Microsoft SQL Server
          • MotherDuck
          • PostgreSQL
          • Redshift
          • Snowflake
          • Starburst/Trino
        • Scheduler environment
          • Amazon Athena
          • BigQuery
          • Clickhouse
          • Databricks
          • Dremio
          • DuckDB
          • Firebolt
          • Microsoft Fabric
          • Microsoft SQL Server
          • MotherDuck
          • PostgreSQL
          • Redshift
          • Snowflake
          • Starburst/Trino
        • Manage connections
          • Set alternative default connection
          • Delete connections
        • Cost connection
          • BigQuery cost connection
          • Snowflake cost connection
        • Connection Security
          • AWS PrivateLink
            • Snowflake PrivateLink
            • Redshift PrivateLink
          • BigQuery OAuth
          • Snowflake OAuth
        • Optional connection attributes
      • Notifications
      • dbt™
        • Upgrade dbt Core™ version
      • Users
        • Invite users
        • Manage Users
        • Enable Auto-join
        • Users and licences
        • Default Roles and Permissions
        • Role-based access control
      • Environment Variables
        • Bolt Schedules Environment Variables
        • Code IDE Environment Variables
  • 💻Developers
    • GraphQL API
      • Authentication
      • Examples
        • Audit Logs API
        • Bolt API
        • User Management API
        • Workspace Management API
    • Python SDK
      • Getting Started
      • Modules
        • Audit Log
        • Bolt
        • Lineage Diff
        • Custom Integration
        • User Management
        • Workspace Management
    • Paradime CLI
      • Getting Started
      • Bolt CLI
    • Webhooks
      • Getting Started
      • Custom Webhook Guides
        • Create an Azure DevOps Work item when a Bolt run complete with errors
        • Create a Linear Issue when a Bolt run complete with errors
        • Create a Jira Issue when a Bolt run complete with errors
        • Trigger a Slack notification when a Bolt run is overrunning
    • Virtual Environments
      • Using Poetry
      • Troubleshooting
    • API Keys
    • IP Restrictions in Paradime
    • Company & Workspace token
  • 🙌Best Practices
    • Data Mesh Setup
      • Configure Project dependencies
      • Model access
      • Model groups
  • ‼️Troubleshooting
    • Errors
    • Error List
    • Restart Code IDE
  • 🔗Other Links
    • Terms of Service
    • Privacy Policy
    • Paradime Blog
Powered by GitBook
On this page
  • Anatomy of a dbt Project
  • Core Components
  • Model Organization Patterns
  • Working with Tests
  • Real-World Project Organization Example
  • Best Practices

Was this helpful?

  1. Concepts
  2. dbt™ fundamentals
  3. Getting started with dbt™

Project Strucuture

A dbt project is a collection of files that define how raw data should be transformed into analytics-ready datasets. Understanding the structure of a dbt project helps you organize transformations effectively and collaborate with your team.

Anatomy of a dbt Project

When you initialize a new dbt project, you'll see a directory structure like this:

dbt_project/
├── models/          # SQL transformations (core of your project)
├── analyses/        # One-off analytical queries
├── tests/           # Custom data tests
├── macros/          # Reusable SQL code blocks
├── snapshots/       # Historical data tracking definitions
├── seeds/           # CSV files to be loaded into the database
├── dbt_project.yml  # Project configuration
├── packages.yml     # External dependency definitions
└── README.md        # Project documentation

Core Components

Models Directory

The models/ directory contains SQL files that define your transformations. Each SQL file typically becomes a table or view in your data warehouse.

-- models/marts/customers.sql
SELECT
    c.customer_id,
    c.name,
    c.email,
    COUNT(o.order_id) as number_of_orders,
    SUM(o.amount) as total_order_value
FROM {{ ref('stg_customers') }} c
LEFT JOIN {{ ref('stg_orders') }} o ON c.customer_id = o.customer_id
GROUP BY 1, 2, 3

Models are often organized into subdirectories by function or data domain:

models/
├── staging/          # Initial cleaning/renaming of source tables
├── intermediate/     # Intermediate transformations
└── marts/            # Business-level presentation models

Configuration Files

Configuration files define project-wide settings and metadata:

dbt_project.yml

This is the central configuration file for your dbt project. It defines:

  • Project name and version

  • Profile to use for database connections

  • Model materialization settings

  • Directory configurations

name: 'ecommerce'
version: '1.0.0'
config-version: 2

profile: 'ecommerce'

model-paths: ["models"]
seed-paths: ["seeds"]
test-paths: ["tests"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

models:
  ecommerce:
    staging:
      +materialized: view
    intermediate:
      +materialized: view
    marts:
      +materialized: table

packages.yml

This file defines external dbt packages that your project depends on:

yamlCopypackages:
  - package: dbt-labs/dbt_utils
    version: 0.8.0
  - package: calogica/dbt_expectations
    version: 0.5.0

Source Definitions

Sources represent the raw data tables in your warehouse. They're defined in YAML files (typically named sources.yml) within the models directory:

version: 2

sources:
  - name: raw_data
    schema: raw
    tables:
      - name: customers
        columns:
          - name: id
            tests:
              - unique
              - not_null
      - name: orders
        columns:
          - name: id
            tests:
              - unique
              - not_null

Model Organization Patterns

There's no single "right way" to organize your dbt project, but here are common patterns that work well:

Layered Approach

This approach organizes models by their purpose in the transformation pipeline:

Layer
Purpose
Example
Typical Materialization

Staging

Clean and standardize raw data

stg_customers.sql

View

Intermediate

Combine multiple staging models

int_customer_orders.sql

View

Marts

Business-ready tables for analytics

dim_customers.sql

Table

Domain-Based Organization

For larger projects, you might organize by business domain first, then by layer:

models/
├── marketing/
│   ├── staging/
│   ├── intermediate/
│   └── marts/
├── finance/
│   ├── staging/
│   ├── intermediate/
│   └── marts/
└── product/
    ├── staging/
    ├── intermediate/
    └── marts/

Working with Tests

dbt supports two types of tests:

Schema Tests

These are defined in YAML files alongside your models:

version: 2

models:
  - name: customers
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null
      - name: email
        tests:
          - unique

Singular Tests

These are custom SQL queries in the tests/ directory that should return zero rows when the test passes:

-- tests/orders_with_invalid_customer.sql
SELECT
    o.order_id
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c
    ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL

Real-World Project Organization Example

Here's how a complete e-commerce dbt project might be organized:

ecommerce_dbt/
├── models/
│   ├── staging/
│   │   ├── stg_customers.sql
│   │   ├── stg_orders.sql
│   │   └── sources.yml
│   ├── intermediate/
│   │   ├── int_order_items.sql
│   │   └── int_customer_orders.sql
│   └── marts/
│       ├── core/
│       │   ├── dim_customers.sql
│       │   ├── dim_products.sql
│       │   ├── fact_orders.sql
│       │   └── schema.yml
│       └── marketing/
│           ├── customer_segmentation.sql
│           └── campaign_performance.sql
├── seeds/
│   ├── country_codes.csv
│   └── product_categories.csv
├── snapshots/
│   └── order_status_history.sql
├── macros/
│   ├── generate_schema_name.sql
│   └── cents_to_dollars.sql
├── tests/
│   └── order_price_check.sql
├── analyses/
│   └── customer_ltv.sql
├── dbt_project.yml
└── packages.yml

Best Practices

  • Be consistent with naming conventions: Use prefixes like stg_, int_, dim_ and fct_ to indicate model purpose

  • Document as you go: Add descriptions in your YAML files for models and columns

  • Start simple: Begin with a staging/marts approach and add complexity as needed

  • Group related models: Keep related transformations close together

  • Limit cross-schema references: Staging should only reference sources, intermediate should only reference staging, etc.

  • Use packages: Don't reinvent common patterns when packages can help

By following a structured approach to organizing your dbt project, you'll create a more maintainable, understandable codebase that enables collaboration and scales with your team.

PreviousIntroductionNextWorking with Sources

Last updated 2 months ago

Was this helpful?

🔍