Volume anomalies
The elementary.volume_anomalies test monitors the row count of your table over time per time bucket. If configured without a timestamp_column, it will count total table rows.
How it works
Data is split into time buckets (daily by default, configurable with the
time_bucketfield).Row count is computed per bucket for the last
training_perioddays (14 days by default).The test compares the row count of each bucket within the detection period (last 2 days by default, configured as
detection_period) to the row count of previous time buckets.The test only runs on completed time buckets. For example, with daily buckets, a test run in the middle of today would only count yesterday as a complete bucket.
If any anomalies are detected during the detection period, the test will fail.
Configuration
models:
- name: < model name >
tests:
- elementary.volume_anomalies:
timestamp_column: < timestamp column >
where_expression: < sql expression >
time_bucket: # Daily by default
period: < time period >
count: < number of periods >models:
- name: login_events
config:
elementary:
timestamp_column: "loaded_at"
tests:
- elementary.volume_anomalies:
where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'"
time_bucket:
period: day
count: 1
# optional - use tags to run elementary tests on a dedicated run
tags: ["elementary"]
config:
# optional - change severity
severity: warn
- name: users
# if no timestamp is configured, elementary will monitor without time filtering
tests:
- elementary.volume_anomalies:
tags: ["elementary"]Test configuration
No mandatory configuration, however it is highly recommended to configure a timestamp_column.
tests:
— elementary.volume_anomalies:
timestamp_column: column name
where_expression: sql expression
anomaly_sensitivity: int
anomaly_direction: [both | spike | drop]
detection_period:
period: [hour | day | week | month]
count: int
training_period:
period: [hour | day | week | month]
count: int
time_bucket:
period: [hour | day | week | month]
count: int
seasonality: day_of_week
fail_on_zero: [true | false]
ignore_small_changes:
spike_failure_percent_threshold: int
drop_failure_percent_threshold: int
detection_delay:
period: [hour | day | week | month]
count: int
anomaly_exclude_metrics: [SQL expression]Last updated
Was this helpful?