All columns anomalies

elementary.all_columns_anomalies

Executes column level monitors and anomaly detection on all the columns of the table. Specific monitors are detailed in the table below and can be configured using the columns_anomalies configuration.

The test checks the data type of each column and only executes monitors that are relevant to it. You can use column_anomalies param to override the default monitors, and exclude_prefix / exclude_regexp to exclude columns from the test.

Default monitors by type:

Data quality metricColumn Type

null_count

any

null_percent

any

min_length

string

max_length

string

average_length

string

missing_count

string

missing_percent

string

min

numeric

max

numeric

average

numeric

zero_count

numeric

zero_percent

numeric

standard_deviation

numeric

variance

numeric

Opt-in monitors by type:

Data quality metricColumn Type

sum

numeric

models:
  - name: < model name >
    config:
      elementary:
        timestamp_column: < timestamp column >
    tests:
      - elementary.all_columns_anomalies:
          column_anomalies: < specific monitors, all if null >
          where_expression: < sql expression >
          time_bucket: # Daily by default
            period: < time period >
            count: < number of periods >

Test configuration

No mandatory configuration, however it is highly recommended to configure a timestamp_column.

tests:
  — elementary.all_columns_anomalies:
    timestamp_column: column name
    column_anomalies: column monitors list
    exclude_prefix: string
    exclude_regexp: regex
    where_expression: sql expression
    anomaly_sensitivity: int
    anomaly_direction: [both | spike | drop]
    detection_period:
      period: [hour | day | week | month]
      count: int
    training_period:
      period: [hour | day | week | month]
      count: int
    time_bucket:
      period: [hour | day | week | month]
      count: int
    seasonality: day_of_week
    detection_delay:
      period: [hour | day | week | month]
      count: int
    ignore_small_changes:
      spike_failure_percent_threshold: int
      drop_failure_percent_threshold: int
    anomaly_exclude_metrics: [SQL expression]

Last updated