An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise

Authors:
Peng Zhang;Xingquan Zhu;Yong Shi;Xindong Wu
Affiliations:
FEDS Center, Chinese Academy of Sciences, Beijing, China 100190;Dept. of Computer Sci. & Eng., Florida Atlantic Univ., Boca Raton, USA 33431;FEDS Center, Chinese Academy of Sciences, Beijing, China 100190 and College of Inform. Sci. & Tech., Univ. of Nebraska at Omaha, Omaha, USA NE 68182;Dept. of Computer Science, University of Vermont,Burlington, Vermont, USA 05405
Venue:
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Year:
2009

Citing 13
Cited 4

Learning in the presence of concept drift and hidden contexts

Machine Learning
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Systematic data selection to mine concept-drifting data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining proactive and reactive predictions for data streams

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Using additive expert ensembles to cope with concept drift

ICML '05 Proceedings of the 22nd international conference on Machine learning
Suppressing model overfitting in mining concept-drifting data streams

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Categorizing and mining concept drifting data streams

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Active Learning from Data Streams

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
On Appropriate Assumptions to Mine Data Streams: Analysis and Practice

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining

Robust ensemble learning for mining noisy data streams

Decision Support Systems
Detecting change via competence model

ICCBR'10 Proceedings of the 18th international conference on Case-Based Reasoning Research and Development
A framework for application-driven classification of data streams

Neurocomputing
A generic classifier-ensemble approach for biomedical named entity recognition

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent years have witnessed a large body of research work on mining concept drifting data streams, where a primary assumption is that the up-to-date data chunk and the yet-to-come data chunk share identical distributions, so classifiers with good performance on the up-to-date chunk would also have a good prediction accuracy on the yet-to-come data chunk. This "stationary assumption", however, does not capture the concept drifting reality in data streams. More recently, a "learnable assumption" has been proposed and allows the distribution of each data chunk to evolve randomly. Although this assumption is capable of describing the concept drifting in data streams, it is still inadequate to represent real-world data streams which usually suffer from noisy data as well as the drifting concepts. In this paper, we propose a Realistic Assumption which asserts that the difficulties of mining data streams are mainly caused by both concept drifting and noisy data chunks. Consequently, we present a new Aggregate Ensemble (AE) framework, which trains base classifiers using different learning algorithms on different data chunks. All the base classifiers are then combined to form a classifier ensemble through model averaging. Experimental results on synthetic and real-world data show that AE is superior to other ensemble methods under our new realistic assumption for noisy data streams.