# Anomaly Detection Tests

### Overview

Elementary data anomaly detection tests monitor specific metrics (like row count, null rate, average value, etc.) and compare recent values to historical data. This process helps detect [significant changes and deviations](https://docs.elementary-data.com/data-tests/data-anomaly-detection), that are probably data reliability issues.

### How Anomaly Detection Works

#### Test Execution Process

1. Data is split into time buckets based on the `time_bucket` field.
2. The data is limited by the `training_period` variable.
3. The test compares a specific metric (e.g., row count) of the buckets within the `detection_period` to all previous time buckets within the `training_period`.
4. If anomalies are detected in the detection period, the test fails.
5. The elementary package executes relevant monitors and searches for anomalies by comparing to historical metrics.

#### When a Test Fails

A test failure indicates that an anomaly was detected for the specific metric and dataset. For more details, refer to the [anomaly detection method](#data-anomaly-detection-method).

### Core concepts

<table data-view="cards"><thead><tr><th align="center"></th><th align="center"></th></tr></thead><tbody><tr><td align="center"><strong>Anomaly</strong></td><td align="center">A value in the detection set that is an outlier compared to the expected range calculated based on the training set.</td></tr><tr><td align="center"><strong>Monitored data set</strong></td><td align="center">The complete dataset used for the data monitor, including both training set and detection set values.</td></tr><tr><td align="center"><strong>Data monitors</strong></td><td align="center">Different metrics (freshness, volume, nullness, uniqueness, distribution, etc.) that we monitor to detect problems.</td></tr><tr><td align="center"><strong>Training set</strong></td><td align="center">The set of values used as a reference point to calculate the expected range.</td></tr><tr><td align="center"><strong>Detection set</strong></td><td align="center">The set of values compared to the expected range. Outliers in this set are flagged as anomalies.</td></tr><tr><td align="center"><strong>Expected range</strong></td><td align="center">The range of values considered normal, calculated based on the training set.</td></tr><tr><td align="center"><strong>Training period</strong></td><td align="center">The time period from which the training set is collected. This is typically a recent period, as data patterns may change over time.</td></tr><tr><td align="center"><strong>Detection period</strong></td><td align="center">The period containing values that are compared to the expected range.</td></tr><tr><td align="center"><strong>Time bucket</strong></td><td align="center">The consistent time intervals into which data is split for analysis. For example, daily buckets for monitoring row count anomalies.</td></tr></tbody></table>

### Data anomaly detection method

Elementary uses the "[standard score](https://en.wikipedia.org/wiki/Standard_score)" (Z-score) for anomaly detection. This score represents the number of standard deviations a value is from the mean of a set of values.

#### [Empirical rule](https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/empirical-rule-2/) in Normal Distribution

* **\~68%** of values have an absolute **z-score of 1 or less.**
* **\~95%** of values have an absolute **z-score of 2 or less.**
* **\~99.7%** of values have an absolute **z-score of 3 or less.**

Values with a **standard score of 3 and above are** [**considered outliers**](https://www.ctspedia.org/do/view/CTSpedia/OutLier)**.** This is Elementary's default threshold, which can be adjusted using the `anomaly_score_threshold` variable in the [global configuration](https://docs.elementary-data.com/data-tests/elementary-tests-configuration).

#### Adjusting Sensitivity

Within your [Elementary Schema](https://docs.paradime.io/app-help/documentation/integrations/observability/elementary-data/pages/yl0XR7WEXpl53yxVXH1A#id-4.-build-elementary-models) in your data warehouse, access the `anomaly_sensitivity` model to see how different scores would affect anomaly detection based on your last run's metric values. This can help in deciding whether to adjust the sensitivity.

<figure><img src="/files/eFQG4dWww0Znbib0PocE" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}

### Best Practices

* Regularly review and adjust your anomaly detection settings based on your data patterns.
* Consider seasonality and known data fluctuations when interpreting results.
* Use anomaly detection in conjunction with other data quality tests for comprehensive monitoring.
  {% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.paradime.io/app-help/documentation/integrations/observability/elementary-data/anomaly-detection-tests.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
