# Column anomalies

The `elementary.column_anomalies` test executes column-level monitors and anomaly detection on a specific column. It checks the data type of the column and only executes monitors that are relevant to it.

### How it works

1. The test analyzes the specified column in the table. It can analyze as many columns as you specify.&#x20;
2. Based on the data type of the column, it applies relevant monitors.
3. You can specify which monitors to run using the `column_anomalies` parameter.

### Default Monitors by Data Type

| Data quality metric  | Column Type |
| -------------------- | ----------- |
| `null_count`         | any         |
| `null_percent`       | any         |
| `min_length`         | string      |
| `max_length`         | string      |
| `average_length`     | string      |
| `missing_count`      | string      |
| `missing_percent`    | string      |
| `min`                | numeric     |
| `max`                | numeric     |
| `average`            | numeric     |
| `zero_count`         | numeric     |
| `zero_percent`       | numeric     |
| `standard_deviation` | numeric     |
| `variance`           | numeric     |

**Opt-in monitors by type:**

| Data quality metric | Column Type |
| ------------------- | ----------- |
| `sum`               | numeric     |

{% tabs %}
{% tab title="Models" %}

```yml
models:
  - name: < model name >
    config:
      elementary:
        timestamp_column: < timestamp column >
    columns:
      - name: < column name >
        tests:
          - elementary.column_anomalies:
              column_anomalies: < specific monitors, all if null >
              where_expression: < sql expression >
              time_bucket: # Daily by default
                period: < time period >
                count: < number of periods >

  - name: < model name >
    ## if no timestamp is configured, elementary will monitor without time filtering
    columns:
      - name: < column name >
        tests:
          - elementary.column_anomalies:
              column_anomalies: < specific monitors, all if null >
              where_expression: < sql expression >
```

{% endtab %}

{% tab title="Models example" %}

```yml
models:
  - name: login_events
    config:
      elementary:
        timestamp_column: 'loaded_at'
    columns:
      - name: user_name
        tests:
          - elementary.column_anomalies:
              column_anomalies:
                - missing_count
                - min_length
              where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'"
              time_bucket:
                period: day
                count: 1
              tags: ['elementary']

  - name: users
    ## if no timestamp is configured, elementary will monitor without time filtering
    tests:
        elementary.volume_anomalies
          tags: ['elementary']
    columns:
      - name: user_id
        tests:
          - elementary.column_anomalies:
              tags: ['elementary']
              timestamp_column: 'updated_at'
              where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'"
              time_bucket:
                period: < time period >
                count: < number of periods >
      - name: user_name
        tests:
          - elementary.column_anomalies:
              column_anomalies:
                - missing_count
                - min_length
              tags: ['elementary']
```

{% endtab %}
{% endtabs %}

### Test configuration <a href="#test-configuration" id="test-configuration"></a>

```yaml
tests:
  — elementary.column_anomalies:
    column_anomalies: column monitors list
    timestamp_column: column name
    where_expression: sql expression
    anomaly_sensitivity: int
    anomaly_direction: [both | spike | drop]
    detection_period:
      period: [hour | day | week | month]
      count: int
    training_period:
      period: [hour | day | week | month]
      count: int
    time_bucket:
      period: [hour | day | week | month]
      count: int
    seasonality: day_of_week
    detection_delay:
      period: [hour | day | week | month]
      count: int
    ignore_small_changes:
      spike_failure_percent_threshold: int
      drop_failure_percent_threshold: int
    anomaly_exclude_metrics: [SQL expression]
```

{% hint style="info" %}

### Important Notes

* No mandatory configuration, however, it is highly recommended to configure a `timestamp_column`.
* Use `column_anomalies` to specify which monitors to run (if not specified, all default monitors will run).
* The `where_expression` can be used to filter the data being tested.
* If no timestamp is configured, Elementary will monitor without time filtering.
* Tags can be used to run elementary tests on a dedicated run.
* You can configure the test at the model level or at the column level.
  {% endhint %}
