All columns anomalies

The elementary.all_columns_anomalies test executes column-level monitors and anomaly detection on all columns of the table. It checks the data type of each column and only executes monitors that are relevant to it.

How it works

  • The test analyzes all columns in the table.

  • Based on the data type of each column, it applies relevant monitors.

  • You can override default monitors using the column_anomalies parameter.

  • Columns can be excluded using exclude_prefix or exclude_regexp parameters.

Default Monitors by Data Type

Data quality metric
Column Type

null_count

any

null_percent

any

min_length

string

max_length

string

average_length

string

missing_count

string

missing_percent

string

min

numeric

max

numeric

average

numeric

zero_count

numeric

zero_percent

numeric

standard_deviation

numeric

variance

numeric

Opt-in monitors by type:

Data quality metric
Column Type

sum

numeric

models:
  - name: < model name >
    config:
      elementary:
        timestamp_column: < timestamp column >
    tests:
      - elementary.all_columns_anomalies:
          column_anomalies: < specific monitors, all if null >
          where_expression: < sql expression >
          time_bucket: # Daily by default
            period: < time period >
            count: < number of periods >

Test configuration

No mandatory configuration, however it is highly recommended to configure a timestamp_column.

tests:
  — elementary.all_columns_anomalies:
    timestamp_column: column name
    column_anomalies: column monitors list
    exclude_prefix: string
    exclude_regexp: regex
    where_expression: sql expression
    anomaly_sensitivity: int
    anomaly_direction: [both | spike | drop]
    detection_period:
      period: [hour | day | week | month]
      count: int
    training_period:
      period: [hour | day | week | month]
      count: int
    time_bucket:
      period: [hour | day | week | month]
      count: int
    seasonality: day_of_week
    detection_delay:
      period: [hour | day | week | month]
      count: int
    ignore_small_changes:
      spike_failure_percent_threshold: int
      drop_failure_percent_threshold: int
    anomaly_exclude_metrics: [SQL expression]

Important Notes

  • No mandatory configuration, however, it is highly recommended to configure a timestamp_column.

  • Use column_anomalies to specify which monitors to run (if not specified, all default monitors will run).

  • exclude_prefix and exclude_regexp can be used to exclude specific columns from the test.

  • The where_expression can be used to filter the data being tested.

  • Global sensitivity can be adjusted using the anomaly_sensitivity parameter.

  • Tags can be used to run elementary tests on a dedicated run.

Last updated