Dimension anomalies

The elementary.dimension_anomalies test counts rows grouped by given dimensions (columns/expressions). It monitors the frequency of values in the configured dimension over time and alerts on unexpected changes in the distribution. This test is best configured on low-cardinality fields.

How it works

  • If timestamp_column is configured, the distribution is collected per time_bucket.

    • If not, it counts the total rows per dimension.

  • The test alerts on unexpected changes in the distribution of dimension values over time.

models:
  - name: < model name >
    config:
      elementary:
        timestamp_column: < timestamp column >
    tests:
      - elementary.dimension_anomalies:
          dimensions: < columns or sql expressions of columns >
          # optional - configure a where a expression to accurate the dimension monitoring
          where_expression: < sql expression >
          time_bucket: # Daily by default
            period: < time period >
            count: < number of periods >

Test configuration

tests:
  — elementary.dimension_anomalies:
    dimensions: sql expression
    timestamp_column: column name
    where_expression: sql expression
    anomaly_sensitivity: int
    anomaly_direction: [both | spike | drop]
    detection_period:
      period: [hour | day | week | month]
      count: int
    training_period:
      period: [hour | day | week | month]
      count: int
    time_bucket:
      period: [hour | day | week | month]
      count: int
    seasonality: day_of_week
    detection_delay:
      period: [hour | day | week | month]
      count: int
    ignore_small_changes:
      spike_failure_percent_threshold: int
      drop_failure_percent_threshold: int
    anomaly_exclude_metrics: [SQL expression]

Important Notes

  • Required configuration: dimensions

  • The test is best suited for low-cardinality fields.

  • If timestamp_column is not configured, the test will monitor without time filtering.

  • Tags can be used to run elementary tests on a dedicated run.

  • Severity can be optionally changed in the config section.

  • The where_expression can be used to refine the scope of dimension monitoring.

Last updated