Testing Data Quality

Why Testing Matters in dbt™

Ensuring data quality is a critical part of any data transformation pipeline. Inaccurate, incomplete, or inconsistent data can lead to incorrect business decisions. dbt™ provides built-in testing capabilities to help catch issues early, enforce data integrity, and maintain confidence in your transformations.

Benefits of dbt™ Testing

  • Ensures Data Integrity – Prevents duplicates, null values, and referential mismatches.

  • Validates Business Logic – Confirms that data meets expected criteria.

  • Catches Issues Early – Detects errors before they affect downstream analytics.

  • Automates Quality Checks – Reduces the need for manual data validation.

  • Supports Collaboration – Helps teams align on data expectations.


Types of dbt™ Tests

dbt™ provides two main types of tests: generic tests (pre-built) and singular tests (user-defined).

1. Generic Tests (Built-in)

dbt™ includes four generic tests that can be applied to columns in your models.

Test
Purpose

not_null

Ensures a column contains no NULL values.

unique

Ensures a column has no duplicate values.

accepted_values

Validates that column values are within a specified list.

relationships

Ensures referential integrity between tables.

2. Singular Tests

If built-in tests are not enough, dbt™ allows users to define singular tests using SQL and Jinja. These are specific, one-off tests that are written as SQL queries that return failing records.


Adding Tests to Your dbt™ Models

Tests in dbt™ are typically defined in YAML files, inside a .yml file for a model.

Example: Model and Corresponding Test File

If you have a model named orders, its schema file containing tests should follow this structure:

models/
  ├── marts/
  │   ├── orders.sql        # The dbt model
  │   ├── schema.yml        # The test definitions for this model

Inside schema.yml, you can define the tests for the orders model:

schema.yml
version: 2

models:
  - name: customers
    columns:
      - name: customer_id
        data_tests:
          - unique
          - not_null
      - name: email
        data_tests:
          - unique
      - name: status
        data_tests:
          - accepted_values:
              values: ['active', 'inactive', 'pending']
      - name: country_id
        data_tests:
          - relationships:
              to: ref('countries')
              field: id

How This Works:

  • not_null: Ensures customer_id is always populated.

  • unique: Prevents duplicate email addresses.

  • accepted_values: Ensures status is only 'active', 'inactive', or 'pending'.

  • relationships: Verifies that every country_id in customers exists in the countries table.


Defining Singular Tests

If your data quality needs are more specific, you can create singular tests using SQL.

Example: Singular Test for Negative Order Totals

-- tests/assert_no_negative_order_totals.sql

SELECT *
FROM {{ ref('orders') }}
WHERE total_amount < 0

If this query returns any rows, the test fails, indicating an issue with negative order amounts.


Running Tests in dbt™

dbt™ makes it easy to run tests on demand or as part of a CI/CD pipeline.

Run All Tests in Your Project

dbt test

Run Tests on a Specific Model

dbt test --models customers

Run a Single Test

dbt test --select test_name

Filter Tests by a Tag

dbt test --select tag:critical
# This runs only tests marked as "critical" in your YAML file.

Troubleshooting Test Failures

If a test fails, dbt™ provides a log with details. Here's how to investigate issues:

  • Use dbt debug to check for project configuration issues.

  • Check the generated SQL by inspecting the target/compiled directory.

  • Store failing rows for review using store_failures: true in your test configuration.

  • Filter test runs with dbt test --select tag:critical to focus on high-priority data issues.

Running tests frequently and resolving failures quickly keeps your data pipeline reliable and trustworthy.


Best Practices for dbt™ Testing

  • Test All Critical Fields – Apply unique, not_null, and relationships tests on key columns.

  • Use a Mix of Generic and Singular Tests – Generic tests catch common issues, while singular tests validate business rules.

  • Write Clear, Descriptive Test Names – Make it easy to understand test failures.

  • Integrate Tests into CI/CD Pipelines – Run tests automatically before deploying changes.

  • Review and Update Tests Regularly – Data models evolve, and tests should too.


Advanced Testing Concepts

Unit Testing

For more complex testing needs, dbt™ also supports unit testing concepts to validate transformation logic. Unit tests allow you to verify that specific transformations produce expected outputs given controlled inputs. More details on unit testing can be found in our Unit Testing documentation.

Last updated

Was this helpful?