Classifying noisy data streams

Authors:
Yong Wang;Zhanhuai Li;Yang Zhang
Affiliations:
Dept. Computer Science & Software, Northwestern Polytechnical University, P.R. China;Dept. Computer Science & Software, Northwestern Polytechnical University, P.R. China;School of Information Engineering, Northwest A&F University, P.R. China
Venue:
FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
Year:
2006

Citing 9
Cited 2

Learning in the presence of concept drift and hidden contexts

Machine Learning
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic Noise Identification and Data Cleaning

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Systematic data selection to mine concept-drifting data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Boosting in the presence of noise

Journal of Computer and System Sciences - Special issue: Learning theory 2003
Online classification of nonstationary data streams

Intelligent Data Analysis

Robust ensemble learning for mining noisy data streams

Decision Support Systems
An efficient ensemble method for classifying skewed data streams

ICIC'11 Proceedings of the 7th international conference on Intelligent Computing: bio-inspired computing and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The two main challenges associated with mining data streams are concept drifting and data noise. Current algorithms mainly depend on the robust of the base classifier or learning ensembles, and have no active mechanisms to deal noisy. However, noise still can induce the drastic drops in accuracy. In this paper, we present a clustering-based method to filter out hard instances and noise instances from data streams. We also propose a trigger to detect concept drifting and build RobustBoosting, an ensemble classifier, by boosting the hard instances. We evaluated RobustBoosting algorithm and AdaptiveBoosting algorithm [1] on the synthetic and real-life data sets. The experiment results show that the proposed method has substantial advantage over AdaptiveBoosting algorithm in prediction accuracy, and that it can converge to target concepts efficiently with high accuracy on datasets with noise level as high as 40%.