A framework for diagnosing changes in evolving data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An Adaptive Learning Approach for Noisy Data Streams
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Maintaining the Maximum Normalized Mean and Applications in Data Stream Mining
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Supporting Customer Retention through Real-Time Monitoring of Individual Web Usage
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Class Specific Fuzzy Decision Trees for Mining High Speed Data Streams
Fundamenta Informaticae
Data Mining and Knowledge Discovery
Prediction and change detection in sequential data for interactive applications
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Detecting changes in unlabeled data streams using martingale
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Class Specific Fuzzy Decision Trees for Mining High Speed Data Streams
Fundamenta Informaticae
Hi-index | 0.00 |
In a data streaming setting, data points are observed one by one. The concepts to be learned from the data points may change infinitely often as the data is streaming. In this paper, we extend the idea of testing exchangeability online (Vovk et al., 2003) to a martingale framework to detect concept changes in time-varying data streams. Two martingale tests are developed to detect concept changes using: (i) martingale values, a direct consequence of the Doob's Maximal Inequality, and (ii) the martingale difference, justified using the Hoeffding-Azuma Inequality. Under some assumptions, the second test theoretically has a lower probability than the first test of rejecting the null hypothesis, "no concept change in the data stream", when it is in fact correct. Experiments show that both martingale tests are effective in detecting concept changes in time-varying data streams simulated using two synthetic data sets and three benchmark data sets.