Creating dbt Sources from Data Warehouse

Setting up source definitions in dbt™ projects involves querying information schemas, documenting tables and columns, ensuring proper YAML formatting, and keeping sources up-to-date as new tables are added. This manual process can be time-consuming and prone to errors.

DinoAI Agent can automatically generate complete and accurate sources.yml files by directly accessing your data warehouse metadata.

Example Prompt

I uploaded some new data to my data warehouse . Can you create a sources.yaml file?

Optional: When updating existing sources with new tables, you can add context by selecting your existing sources.yml file. Context helps DinoAI understand your existing project structure and make more relevant changes.

How It Works

After you enter your prompt:

  1. DinoAI connects to your data warehouse and scans available schemas and tables

  2. It retrieves column information including data types

  3. If configured, DinoAI applies your .dinorules preferences

  4. It generates a properly formatted sources.yml file

Note: If you're updating existing (not creating) a YAML file, DinoAI preserves existing documentation and adds only the new tables

Example Output

DinoAI will generate a properly formatted sources.yml file like this:

version: 2

sources:
  - name: formula_one
    database: FORMULA_ONE_DB
    schema: raw
    tables:
      - name: CIRCUITS
        columns:
          - name: CIRCUITID
            data_type: NUMBER
          - name: CIRCUITREF
            data_type: VARCHAR
          # Additional columns...
            
      - name: CONSTRUCTORS
        columns:
          - name: CONSTRUCTORID
            data_type: NUMBER
          # Additional columns...
      
      # Additional tables...

Key Benefits

  • Time Savings: Reduces a 30+ minute manual task to seconds

  • Accuracy: Eliminates typos and formatting errors

  • Maintainability: Makes it easy to keep sources up-to-date as your warehouse evolves

  • Completeness: Captures all tables and columns without missing anything

When to Use This

  • When setting up a new dbt™ project

  • When data engineers have added new tables to your warehouse

  • During data migrations or schema updates

  • Any time your source data structure changes

Last updated

Was this helpful?