You are here

Technical report | Uniform Calibration of Anomaly Detectors with Multiple Sub-Classes for Robust Performance

Executive Summary

Anomaly detection is the task of sorting data points into normal and anomalous classes. In a semi-supervised setting, there is only access to normal class samples during the training phase. Through generating and ranking an appropriate scalar quantity, these training samples can be used to calibrate the anomaly detector so that it produces a specified false alarm rate. However, this procedure only controls the global false alarm rate, which can lead to excess misclassification of certain (normal) sub-classes, or the detector may become miscalibrated if the proportion of sub-classes drift.

In this technical report, we propose a method for constructing an anomaly detector with a uniform false alarm rate for each sub-class: an anomaly detector is trained for each sub-class independently and tuned for a specified false alarm rate. These can then be combined into a single anomaly detector (that flags an anomaly if and only if each sub-detector identifies an anomaly) that has a maximum specified false alarm rate, regardless of any drift in the sub-class distribution. This approach brings a number of benefits:

  • It protects against imbalance in the sub-classes. The traditional practice of using a single threshold can cause sub-classes with few samples to become consistently misclassified as anomalies. This will not happen for separate thresholds, although there can be degraded performance because of small sample sizes.
  • It can produce more robust results when there are concerns about algorithmic bias or discrimination for particular sub-classes. For example, when sub-classes represent populations with socially-protected attributes.
  • It protects against drift in the sampling distribution. The false alarm rate will remain nearly constant, even if the proportion of each sub-class changes dramatically.
  • It allows different algorithms or metrics to be used for each sub-class. For instance, one sub-class could use a Gaussian Naïve Bayes algorithm to classify anomalies with the probability density used as the classification metric, while another sub-class may use a encoder-decoder neural network with the reconstruction error instead.
  • The ensemble anomaly detector can be easily customized. Sub-class anomaly detectors can be added or removed without having to retrain any other components. This is not true when a single global threshold is used.

The false positive and negative rates can be estimated by validating each sub-class detector against all of the data points, as shown in Figure 1. This allows cross-validation to be applied to a semi-supervised learning technique (where usually supervised learning is required). While the actual false negative rate will depend on the specific anomalies encountered, this provides a general method for assessing the properties of an anomaly detector.

Cross-validation can also provide a level of granular insight not normally available. This is demonstrated by training a convolutional autoencoder on the MNIST data set: we find the sub-detector for '1's can effectively reject other digits, while the '7' and '9' detectors are unable to recognize '1's as anomalies. These kinds of insights can be used to guide further improvement of the anomaly detector. While the sub-class detectors we constructed seem to somewhat reduce the ability to detect anomalies (Type-II errors), this nevertheless provides a promising direction for building more robust and flexible anomaly detectors.

Key information


T. L. Keevers

Publication number


Publication type

Technical report

Publish Date

September 2020


Unclassified - public release


Machine learning, Interpretability, Statistics