# Testing Data Quality

Ensuring data quality is critical for any analytics pipeline. dbt provides built-in testing capabilities that help catch issues early, enforce data integrity, and maintain confidence in your transformations. This guide explains how to implement tests in your dbt project.

### Why Testing Matters in dbt

Data tests serve several essential purposes:

* **Validate assumptions** about your data
* **Catch errors** before they impact downstream consumers
* **Document expectations** about data properties
* **Ensure consistency** across transformations

Without testing, issues can creep into your data pipeline, potentially leading to incorrect business decisions or loss of trust in your analytics.

{% hint style="info" %}
**Benefits of dbt Testing**

* **Ensures Data Integrity** – Prevents duplicates, null values, and referential mismatches.
* **Validates Business Logic** – Confirms that data meets expected criteria.
* **Catches Issues Early** – Detects errors before they affect downstream analytics.
* **Automates Quality Checks** – Reduces the need for manual data validation.
* **Supports Collaboration** – Helps teams align on data expectations.
  {% endhint %}

***

### Types of dbt Tests

dbt supports two main types of tests:

#### 1. Generic Tests (Built-in)

Generic tests are reusable test definitions that can be applied to multiple models and columns. dbt includes four built-in generic tests:

| Test              | Purpose                                                  | Example Use                          |
| ----------------- | -------------------------------------------------------- | ------------------------------------ |
| `unique`          | Ensures a column has no duplicate values                 | Primary keys, email addresses        |
| `not_null`        | Ensures a column contains no NULL values                 | Required fields, join keys           |
| `accepted_values` | Validates that column values are within a specified list | Status fields, categories            |
| `relationships`   | Ensures referential integrity between tables             | Foreign keys, dimensional references |

#### 2. Singular Tests

Singular tests are custom SQL queries that define specific test logic. These are one-off tests written as SQL queries that return failing records.

***

### Adding Tests to Your Models

Tests in dbt are typically defined in YAML files alongside your models.

#### Generic Tests Example

```yaml
# models/schema.yml
version: 2

models:
  - name: customers
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null
      - name: email
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['active', 'inactive', 'pending']
      - name: country_id
        tests:
          - relationships:
              to: ref('countries')
              field: id
```

This YAML configuration:

* Tests that `customer_id` values are unique and not null
* Tests that `email` values are unique and not null
* Tests that `status` values are only 'active', 'inactive', or 'pending'
* Tests that each `country_id` exists in the `countries` table

#### Singular Test Example

Singular tests are SQL files in the `tests/` directory:

```sql
-- tests/assert_total_payment_amount_matches_order_amount.sql
-- This test checks that payment amounts sum to order amounts
SELECT
  orders.order_id,
  orders.amount as order_amount,
  SUM(payments.amount) as payment_amount
FROM {{ ref('orders') }}
LEFT JOIN {{ ref('payments') }} ON orders.order_id = payments.order_id
GROUP BY orders.order_id, orders.amount
HAVING ABS(orders.amount - SUM(COALESCE(payments.amount, 0))) > 0.01
```

This test identifies orders where the payment amounts don't match the order amount.

***

### Running Tests

dbt makes it easy to run tests as part of your workflow.

| Operation                    | Command                          | Description                                     |
| ---------------------------- | -------------------------------- | ----------------------------------------------- |
| Running All Tests            | `dbt test`                       | Runs all tests in your dbt project              |
| Testing Specific Models      | `dbt test --models customers`    | Runs all tests associated with a specific model |
| Running a Single Test        | `dbt test --select test_name`    | Runs a specific test by name                    |
| Testing Critical Models Only | `dbt test --select tag:critical` | Runs tests only for models tagged as 'critical' |

***

### Test Configuration Options

You can configure how tests behave using additional parameters.

#### Setting Severity Levels

Tests can be warnings instead of errors:

```yaml
models:
  - name: orders
    columns:
      - name: status
        tests:
          - accepted_values:
              values: ['completed', 'shipped', 'returned']
              severity: warn  # Won't cause pipelines to fail
```

#### Store Test Failures

Save test failures for analysis:

```yaml
models:
  - name: large_table
    columns:
      - name: id
        tests:
          - unique:
              config:
                store_failures: true  # Saves failures to a table
```

#### Limiting Failure Volume

Control how many failures are reported:

```yaml
models:
  - name: events
    columns:
      - name: event_id
        tests:
          - unique:
              config:
                limit: 100  # Only show first 100 failures
```

***

### Creating Custom Generic Tests

You can extend dbt's testing capabilities by creating custom generic tests as macros:

```sql
-- macros/test_is_valid_email.sql
{% macro test_is_valid_email(model, column_name) %}

select *
from {{ model }}
where 
    {{ column_name }} is not null 
    and not regexp_like(
        {{ column_name }}, 
        '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
    )

{% endmacro %}
```

Then use it just like built-in tests:

```yaml
models:
  - name: customers
    columns:
      - name: email
        tests:
          - is_valid_email
```

***

### Test Organization Strategies

As your project grows, organizing tests becomes important:

#### Test by Business Domain

Group tests alongside the models they validate:

```
models/
├── marketing/
│   ├── schema.yml      # Contains tests for marketing models
│   ├── campaigns.sql
│   └── ad_performance.sql
└── finance/
    ├── schema.yml      # Contains tests for finance models
    ├── transactions.sql
    └── accounts.sql
```

#### Centralized Tests

Maintain all tests in a dedicated location:

```
models/
└── ...
tests/
├── generic/            # Custom generic tests
│   └── is_valid_email.sql
└── singular/           # Singular tests
    ├── marketing/
    │   └── campaign_consistency.sql
    └── finance/
        └── transaction_reconciliation.sql
```

***

### Troubleshooting Failed Tests

When tests fail, dbt provides information to help diagnose the issue:

1. **Review test SQL**: dbt outputs the SQL for the failing test
2. **Examine failing records**: Look at examples of failing records
3. **Check compiled SQL**: Review the compiled test SQL in `target/compiled/`
4. **Store failures**: Use `store_failures: true` to analyze failure patterns

***

### Test Output Examples

#### Passing Tests

```
18:42:10 | 1 of 4 START test not_null_orders_order_id ............................ [RUN]
18:42:10 | 1 of 4 PASS not_null_orders_order_id ................................. [PASS in 0.08s]
18:42:10 | 2 of 4 START test unique_orders_order_id ............................. [RUN]
18:42:10 | 2 of 4 PASS unique_orders_order_id ................................... [PASS in 0.10s]
```

#### Failing Tests

```
18:42:11 | 3 of 4 START test accepted_values_orders_status_completed__shipped__returned ... [RUN]
18:42:11 | 3 of 4 FAIL accepted_values_orders_status_completed__shipped__returned ... [FAIL in 0.15s]
18:42:11 | 4 of 4 START test relationships_orders_customer_id__customer_id__ref_customers_ ... [RUN]
18:42:11 | 4 of 4 FAIL relationships_orders_customer_id__customer_id__ref_customers_ ... [FAIL in 0.19s]
```

For failing tests, dbt shows details about the failures:

```
Failure in test relationships_orders_customer_id__customer_id__ref_customers_
Got 2 results, expected 0.

compiled SQL at target/compiled/.../relationships_orders_customer_id__customer_id__ref_customers_.sql
```

***

### Best Practices for dbt Testing

**Test Coverage Strategy**

* **Test primary keys** with `unique` and `not_null`
* **Test foreign keys** with `relationships`
* **Test categorical fields** with `accepted_values`
* **Test business logic** with singular tests
* Focus on testing critical data first

**Test Organization**

* Use a consistent naming convention for singular tests
* Group related tests together
* Document what each test validates

**Test Execution**

* Run tests before finalizing model changes
* Include tests in CI/CD pipelines
* Alert on test failures in production

**Test Maintainability**

* Prefer generic tests for common validations
* Create custom generic tests for repeated patterns
* Use macros to generate complex test logic {% endhint %}

By implementing a robust testing strategy in dbt, you can ensure your data transformations maintain high quality and reliability, building trust in your analytics data among stakeholders.
