Anomalies are data points that are greatly different from the rest of the data set they’re a part of. Data scientists may want to identify anomalies to investigate what’s causing them or to remove them from calculations they can misleadingly affect, such as means or standard deviations. Anomalies can be caused by instrument or measurement errors or they can be valid data points that simply differ greatly from what’s expected. In either case, identifying anomalies is the first step to understanding them.

How to detect anomalies?

One way of detecting anomalies is to set thresholds beyond which data is classified as an anomaly. A common way of setting thresholds is to use multiples of the standard deviation of a data set. If a data set has a normal distribution, 99.7% of data points will be within three standard deviations from the mean value. Statistical theory forms the basis of some common anomaly detection methods like z-scores and Grubb’s test. Other anomaly detection methods use density-based techniques, correlation-based detection, or neural networks. New methods of detecting anomalies are still being theorized, and different methods are more successful with different kinds of data sets.

Take charge of your operations and lower storage costs by 90%

Get Started for Free Run a Proof of Concept

No credit card required.

Anomaly Detection

What is Anomaly Detection?

How to detect anomalies?

Related resources

Free InfluxDB Training

Product & Solutions

Developers

Company

Anomaly Detection

What is Anomaly Detection?

How to detect anomalies?

Related resources

Introduction to Time Series Data

Anomaly detection with median absolute deviation

Data Science Tools for Time Series Data

Free InfluxDB Training

Product & Solutions

Developers

Company

Sign up for the InfluxData newsletter

Follow Us