Dimension anomalies
The elementary.dimension_anomalies
test counts rows grouped by given dimensions
(columns/expressions). It monitors the frequency of values in the configured dimension over time and alerts on unexpected changes in the distribution. This test is best configured on low-cardinality fields.
How it works
If
timestamp_column
is configured, the distribution is collected pertime_bucket
.If not, it counts the total rows per dimension.
The test alerts on unexpected changes in the distribution of dimension values over time.
models:
- name: < model name >
config:
elementary:
timestamp_column: < timestamp column >
tests:
- elementary.dimension_anomalies:
dimensions: < columns or sql expressions of columns >
# optional - configure a where a expression to accurate the dimension monitoring
where_expression: < sql expression >
time_bucket: # Daily by default
period: < time period >
count: < number of periods >
Test configuration
tests:
— elementary.dimension_anomalies:
dimensions: sql expression
timestamp_column: column name
where_expression: sql expression
anomaly_sensitivity: int
anomaly_direction: [both | spike | drop]
detection_period:
period: [hour | day | week | month]
count: int
training_period:
period: [hour | day | week | month]
count: int
time_bucket:
period: [hour | day | week | month]
count: int
seasonality: day_of_week
detection_delay:
period: [hour | day | week | month]
count: int
ignore_small_changes:
spike_failure_percent_threshold: int
drop_failure_percent_threshold: int
anomaly_exclude_metrics: [SQL expression]
Important Notes
Required configuration:
dimensions
The test is best suited for low-cardinality fields.
If
timestamp_column
is not configured, the test will monitor without time filtering.Tags can be used to run elementary tests on a dedicated run.
Severity can be optionally changed in the config section.
The
where_expression
can be used to refine the scope of dimension monitoring.
Last updated
Was this helpful?