Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Elements of information theory
Elements of information theory
Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
A framework for measuring changes in data characteristics
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Sampling from a moving window over streaming data
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Information Theory: Coding Theorems for Discrete Memoryless Systems
Information Theory: Coding Theorems for Discrete Memoryless Systems
Representing and Querying Changes in Semistructured Data
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Finding surprising patterns in a time series database in linear time and space
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Bursty and Hierarchical Structure in Streams
Data Mining and Knowledge Discovery
A framework for diagnosing changes in evolving data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Online outlier detection in sensor data using non-parametric models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Statistical change detection for multi-dimensional data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting change in data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Learning model trees from evolving data streams
Data Mining and Knowledge Discovery
Robustness of change detection algorithms
IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Hi-index | 0.00 |
Data streams are dynamic, with frequent distributional changes. In this paper, we propose a statistical approach to detecting distributional shifts in multi-dimensional data streams. We use relative entropy, also known as the Kullback-Leibler distance, to measure the statistical distance between two distributions. In the context of a multi-dimensional data stream, the distributions are generated by data from two sliding windows. We maintain a sample of the data from the stream inside the windows to build the distributions. Our algorithm is streaming, nonparametric, and requires no distributional or model assumptions. It employs the statistical theory of hypothesis testing and bootstrapping to determine whether the distributions are statistically different. We provide a full suite of experiments on synthetic data to validate the method and demonstrate its effectiveness on data from real-life applications.