Learning in the presence of concept drift and hidden contexts
Machine Learning
Selecting Examples for Partial Memory Learning
Machine Learning
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Requirements for clustering data streams
ACM SIGKDD Explorations Newsletter
Intelligent Data Analysis: An Introduction
Intelligent Data Analysis: An Introduction
Detecting Concept Drift with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Wavelet synopsis for data streams: minimizing non-euclidean error
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
Discretization from data streams: applications to histograms and data mining
Proceedings of the 2006 ACM symposium on Applied computing
OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams
Proceedings of the 2007 ACM symposium on Applied computing
Incremental discretization, application to data with concept drift
Proceedings of the 2007 ACM symposium on Applied computing
Sketching probabilistic data streams
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Exploiting duality in summarization with deterministic guarantees
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning drifting concepts: Example selection vs. example weighting
Intelligent Data Analysis
Detecting change in data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
REHIST: relative error histogram construction algorithms
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Change detection in learning histograms from data streams
EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Hi-index | 0.00 |
Histograms are a common technique for density estimation and they have been widely used as a tool in exploratory data analysis. Learning histograms from static and stationary data is a well known topic. Nevertheless, very few works discuss this problem when we have a continuous flow of data generated from dynamic environments. The scope of this paper is to detect changes from high-speed time-changing data streams. To address this problem, we construct histograms able to process examples once at the rate they arrive. The main goal of this work is continuously maintain a histogram consistent with the current status of the nature. We study strategies to detect changes in the distribution generating examples, and adapt the histogram to the most recent data by forgetting outdated data. We use the Partition Incremental Discretization algorithm that was designed to learn histograms from high-speed data streams. We present a method to detect whenever a change in the distribution generating examples occurs. The base idea consists of monitoring distributions from two different time windows: the reference window, reflecting the distribution observed in the past; and the current window which receives the most recent data. The current window is cumulative and can have a fixed or an adaptive step depending on the distance between distributions. We compared both distributions using Kullback-Leibler divergence, defining a threshold for change detection decision based on the asymmetry of this measure. We evaluated our algorithm with controlled artificial data sets and compare the proposed approach with nonparametric tests. We also present results with real word data sets from industrial and medical domains. Those results suggest that an adaptive window's step exhibit high probability in change detection and faster detection rates, with few false positives alarms.