# Column anomalies

The `elementary.column_anomalies` test executes column-level monitors and anomaly detection on a specific column. It checks the data type of the column and only executes monitors that are relevant to it.

### How it works

1. The test analyzes the specified column in the table. It can analyze as many columns as you specify.
2. Based on the data type of the column, it applies relevant monitors.
3. You can specify which monitors to run using the `column_anomalies` parameter.

### Default Monitors by Data Type

| Data quality metric  | Column Type |
| -------------------- | ----------- |
| `null_count`         | any         |
| `null_percent`       | any         |
| `min_length`         | string      |
| `max_length`         | string      |
| `average_length`     | string      |
| `missing_count`      | string      |
| `missing_percent`    | string      |
| `min`                | numeric     |
| `max`                | numeric     |
| `average`            | numeric     |
| `zero_count`         | numeric     |
| `zero_percent`       | numeric     |
| `standard_deviation` | numeric     |
| `variance`           | numeric     |

**Opt-in monitors by type:**

| Data quality metric | Column Type |
| ------------------- | ----------- |
| `sum`               | numeric     |

{% tabs %}
{% tab title="Models" %}

```yml
models:
  - name: < model name >
    config:
      elementary:
        timestamp_column: < timestamp column >
    columns:
      - name: < column name >
        tests:
          - elementary.column_anomalies:
              column_anomalies: < specific monitors, all if null >
              where_expression: < sql expression >
              time_bucket: # Daily by default
                period: < time period >
                count: < number of periods >

  - name: < model name >
    ## if no timestamp is configured, elementary will monitor without time filtering
    columns:
      - name: < column name >
        tests:
          - elementary.column_anomalies:
              column_anomalies: < specific monitors, all if null >
              where_expression: < sql expression >
```

{% endtab %}

{% tab title="Models example" %}

```yml
models:
  - name: login_events
    config:
      elementary:
        timestamp_column: 'loaded_at'
    columns:
      - name: user_name
        tests:
          - elementary.column_anomalies:
              column_anomalies:
                - missing_count
                - min_length
              where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'"
              time_bucket:
                period: day
                count: 1
              tags: ['elementary']

  - name: users
    ## if no timestamp is configured, elementary will monitor without time filtering
    tests:
        elementary.volume_anomalies
          tags: ['elementary']
    columns:
      - name: user_id
        tests:
          - elementary.column_anomalies:
              tags: ['elementary']
              timestamp_column: 'updated_at'
              where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'"
              time_bucket:
                period: < time period >
                count: < number of periods >
      - name: user_name
        tests:
          - elementary.column_anomalies:
              column_anomalies:
                - missing_count
                - min_length
              tags: ['elementary']
```

{% endtab %}
{% endtabs %}

### Test configuration <a href="#test-configuration" id="test-configuration"></a>

```yaml
tests:
  — elementary.column_anomalies:
    column_anomalies: column monitors list
    timestamp_column: column name
    where_expression: sql expression
    anomaly_sensitivity: int
    anomaly_direction: [both | spike | drop]
    detection_period:
      period: [hour | day | week | month]
      count: int
    training_period:
      period: [hour | day | week | month]
      count: int
    time_bucket:
      period: [hour | day | week | month]
      count: int
    seasonality: day_of_week
    detection_delay:
      period: [hour | day | week | month]
      count: int
    ignore_small_changes:
      spike_failure_percent_threshold: int
      drop_failure_percent_threshold: int
    anomaly_exclude_metrics: [SQL expression]
```

{% hint style="info" %}

#### Important Notes

* No mandatory configuration, however, it is highly recommended to configure a `timestamp_column`.
* Use `column_anomalies` to specify which monitors to run (if not specified, all default monitors will run).
* The `where_expression` can be used to filter the data being tested.
* If no timestamp is configured, Elementary will monitor without time filtering.
* Tags can be used to run elementary tests on a dedicated run.
* You can configure the test at the model level or at the column level.
  {% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.paradime.io/app-help/documentation/integrations/observability/elementary-data/anomaly-detection-tests/column-anomalies.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
