Column anomalies
The elementary.column_anomalies
test executes column-level monitors and anomaly detection on a specific column. It checks the data type of the column and only executes monitors that are relevant to it.
How it works
The test analyzes the specified column in the table. It can analyze as many columns as you specify.
Based on the data type of the column, it applies relevant monitors.
You can specify which monitors to run using the
column_anomalies
parameter.
Default Monitors by Data Type
null_count
any
null_percent
any
min_length
string
max_length
string
average_length
string
missing_count
string
missing_percent
string
min
numeric
max
numeric
average
numeric
zero_count
numeric
zero_percent
numeric
standard_deviation
numeric
variance
numeric
Opt-in monitors by type:
sum
numeric
models:
- name: < model name >
config:
elementary:
timestamp_column: < timestamp column >
columns:
- name: < column name >
tests:
- elementary.column_anomalies:
column_anomalies: < specific monitors, all if null >
where_expression: < sql expression >
time_bucket: # Daily by default
period: < time period >
count: < number of periods >
- name: < model name >
## if no timestamp is configured, elementary will monitor without time filtering
columns:
- name: < column name >
tests:
- elementary.column_anomalies:
column_anomalies: < specific monitors, all if null >
where_expression: < sql expression >
Test configuration
tests:
— elementary.column_anomalies:
column_anomalies: column monitors list
timestamp_column: column name
where_expression: sql expression
anomaly_sensitivity: int
anomaly_direction: [both | spike | drop]
detection_period:
period: [hour | day | week | month]
count: int
training_period:
period: [hour | day | week | month]
count: int
time_bucket:
period: [hour | day | week | month]
count: int
seasonality: day_of_week
detection_delay:
period: [hour | day | week | month]
count: int
ignore_small_changes:
spike_failure_percent_threshold: int
drop_failure_percent_threshold: int
anomaly_exclude_metrics: [SQL expression]
Important Notes
No mandatory configuration, however, it is highly recommended to configure a
timestamp_column
.Use
column_anomalies
to specify which monitors to run (if not specified, all default monitors will run).The
where_expression
can be used to filter the data being tested.If no timestamp is configured, Elementary will monitor without time filtering.
Tags can be used to run elementary tests on a dedicated run.
You can configure the test at the model level or at the column level.
Last updated
Was this helpful?