Column anomalies

The elementary.column_anomalies test executes column-level monitors and anomaly detection on a specific column. It checks the data type of the column and only executes monitors that are relevant to it.

How it works

  1. The test analyzes the specified column in the table. It can analyze as many columns as you specify.

  2. Based on the data type of the column, it applies relevant monitors.

  3. You can specify which monitors to run using the column_anomalies parameter.

Default Monitors by Data Type

Data quality metric
Column Type

null_count

any

null_percent

any

min_length

string

max_length

string

average_length

string

missing_count

string

missing_percent

string

min

numeric

max

numeric

average

numeric

zero_count

numeric

zero_percent

numeric

standard_deviation

numeric

variance

numeric

Opt-in monitors by type:

Data quality metric
Column Type

sum

numeric

models:
  - name: < model name >
    config:
      elementary:
        timestamp_column: < timestamp column >
    columns:
      - name: < column name >
        tests:
          - elementary.column_anomalies:
              column_anomalies: < specific monitors, all if null >
              where_expression: < sql expression >
              time_bucket: # Daily by default
                period: < time period >
                count: < number of periods >

  - name: < model name >
    ## if no timestamp is configured, elementary will monitor without time filtering
    columns:
      - name: < column name >
        tests:
          - elementary.column_anomalies:
              column_anomalies: < specific monitors, all if null >
              where_expression: < sql expression >

Test configuration

tests:
  — elementary.column_anomalies:
    column_anomalies: column monitors list
    timestamp_column: column name
    where_expression: sql expression
    anomaly_sensitivity: int
    anomaly_direction: [both | spike | drop]
    detection_period:
      period: [hour | day | week | month]
      count: int
    training_period:
      period: [hour | day | week | month]
      count: int
    time_bucket:
      period: [hour | day | week | month]
      count: int
    seasonality: day_of_week
    detection_delay:
      period: [hour | day | week | month]
      count: int
    ignore_small_changes:
      spike_failure_percent_threshold: int
      drop_failure_percent_threshold: int
    anomaly_exclude_metrics: [SQL expression]

Important Notes

  • No mandatory configuration, however, it is highly recommended to configure a timestamp_column.

  • Use column_anomalies to specify which monitors to run (if not specified, all default monitors will run).

  • The where_expression can be used to filter the data being tested.

  • If no timestamp is configured, Elementary will monitor without time filtering.

  • Tags can be used to run elementary tests on a dedicated run.

  • You can configure the test at the model level or at the column level.

Last updated