Change (Detection) You Can Believe in: Finding Distributional Shifts in Data Streams

  • Authors:
  • Tamraparni Dasu;Shankar Krishnan;Dongyu Lin;Suresh Venkatasubramanian;Kevin Yi

  • Affiliations:
  • AT&T Labs - Research,;AT&T Labs - Research,;University of Pennsylvania,;University of Utah,;Hong Kong University of Science and Technology,

  • Venue:
  • IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data streams are dynamic, with frequent distributional changes. In this paper, we propose a statistical approach to detecting distributional shifts in multi-dimensional data streams. We use relative entropy, also known as the Kullback-Leibler distance, to measure the statistical distance between two distributions. In the context of a multi-dimensional data stream, the distributions are generated by data from two sliding windows. We maintain a sample of the data from the stream inside the windows to build the distributions. Our algorithm is streaming, nonparametric, and requires no distributional or model assumptions. It employs the statistical theory of hypothesis testing and bootstrapping to determine whether the distributions are statistically different. We provide a full suite of experiments on synthetic data to validate the method and demonstrate its effectiveness on data from real-life applications.