Learning in the presence of concept drift and hidden contexts
Machine Learning
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
ELEM2: A Learning System for More Accurate Classifications
AI '98 Proceedings of the 12th Biennial Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Mining concept-drifting data streams using ensemble classifiers
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On demand classification of data streams
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining proactive and reactive predictions for data streams
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Entropy-based Concept Shift Detection
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Detecting concept drift using statistical testing
DS'07 Proceedings of the 10th international conference on Discovery science
Hi-index | 0.00 |
The major challenge in mining data streams is the issue of concept drift, the tendency of the underlying data generation process to change over time. In this paper, we propose a general rule learning framework that can efficiently handle concept-drifting data streams and maintain a highly accurate classification model. The main idea is to focus on partial drifts by allowing individual rules to monitor the stream and detect if there is a drift in the regions they cover. A rule quality measure then decides whether the affected rules are inconsistent with the concept drift. The model is accordingly updated to only include rules that are consistent with the newly arrived concept. A dynamically maintained set of instances deemed relevant to the most recent concept is also kept at memory. Learning a new concept from a larger set of instances reduces the variance of data distribution and allows for a more accurate, stable classification model. Our experiments show that this approach not only handles the drift efficiently, but it also can provide higher classification accuracy compared to other competitive approaches on a variety of real and synthetic data sets.