Add and Run dbt™ tests

Testing is a crucial part of any data transformation pipeline. dbt provides built-in testing capabilities to ensure the quality and reliability of your data models.

Purpose

Adding and running tests in dbt serves several important functions:

  • Validates the integrity of your data transformations

  • Ensures that your models meet expected criteria

  • Catches errors early in the development process

  • Provides confidence in the reliability of your data pipeline

  • Facilitates collaboration by setting clear expectations for data quality

Key Components

Built-in Generic Tests

dbt provides four generic tests out of the box:

  1. not_null: Ensures a column contains no null values

  2. unique: Checks for duplicate values in a column

  3. accepted_values: Validates that all values in a column are within a specified list

  4. relationships: Checks referential integrity between tables

Implementing Tests in YAML

Tests are typically defined in YAML files. Here's an example:

version: 2

models:
  - name: customers
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null
      - name: email
        tests:
          - unique
      - name: status
        tests:
          - accepted_values:
              values: ['active', 'inactive', 'pending']
      - name: country_id
        tests:
          - relationships:
              to: ref('countries')
              field: id

Custom Test Names

You can provide custom names for your tests:

models:
  - name: orders
    columns:
      - name: status
        tests:
          - accepted_values:
              name: valid_order_status
              values: ['placed', 'shipped', 'completed', 'returned']

Alternative Test Definition Format

For complex tests, you can use an alternative format:

models:
  - name: orders
    columns:
      - name: status
        tests:
          - name: valid_order_status
            test_name: accepted_values
            values: ['placed', 'shipped', 'completed', 'returned']
            config:
              where: "order_date = current_date"

Running Tests

To run all tests in your project:

dbt test

To run tests on a specific model:

dbt test --models model_name

Best Practices

  1. Implement tests for all critical data quality assumptions

  2. Use a combination of generic and custom tests for comprehensive coverage

  3. Write clear, descriptive names for your tests

  4. Include tests that validate your business logic, not just data integrity

  5. Run tests frequently, ideally as part of your CI/CD pipeline

  6. Review and update tests as your data models evolve

  7. Document the purpose and expected outcomes of your tests

Remember, thorough testing is key to building reliable data pipelines and maintaining trust in your data.

Last updated