Using labeled data to evaluate change detectors in a multivariate streaming environment

Authors:
Albert Y. Kim;Caren Marzban;Donald B. Percival;Werner Stuetzle
Affiliations:
Department of Statistics, Box 354322, University of Washington, Seattle, WA 98195 4322, USA;Department of Statistics, Box 354322, University of Washington, Seattle, WA 98195 4322, USA and Applied Physics Laboratory, Box 355640, University of Washington, Seattle, WA 98195 5640, USA;Applied Physics Laboratory, Box 355640, University of Washington, Seattle, WA 98195 5640, USA and Department of Statistics, Box 354322, University of Washington, Seattle, WA 98195 4322, USA;Department of Statistics, Box 354322, University of Washington, Seattle, WA 98195 4322, USA
Venue:
Signal Processing
Year:
2009

Citing 5
Cited 0

Novelty detection: a review—part 1: statistical approaches

Signal Processing
Novelty detection: a review—part 2: neural network based approaches

Signal Processing
Testing monotone high-dimensional distributions

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Quantified Score

Hi-index	0.08

Visualization

Abstract

We consider the problem of detecting changes in a multivariate data stream. A change detector is defined by a detection algorithm and an alarm threshold. A detection algorithm maps the stream of input vectors into a univariate detection stream. The detector signals a change when the detection stream exceeds the chosen alarm threshold. We consider two aspects of the problem: (1) setting the alarm threshold and (2) measuring/comparing the performance of detection algorithms. We assume we are given a segment of the stream where changes of interest are marked. We present evidence that, without such marked training data, it might not be possible to accurately estimate the false alarm rate for a given alarm threshold. Commonly used approaches assume the data stream consists of independent observations, an implausible assumption given the time series nature of the data. Lack of independence can lead to estimates that are badly biased. Marked training data can also be used for realistic comparison of detection algorithms. We define a version of the receiver operating characteristic curve adapted to the change detection problem and propose a block bootstrap for comparing such curves. We illustrate the proposed methodology using multivariate data derived from an image stream.