Classifying noisy data streams

  • Authors:
  • Yong Wang;Zhanhuai Li;Yang Zhang

  • Affiliations:
  • Dept. Computer Science & Software, Northwestern Polytechnical University, P.R. China;Dept. Computer Science & Software, Northwestern Polytechnical University, P.R. China;School of Information Engineering, Northwest A&F University, P.R. China

  • Venue:
  • FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The two main challenges associated with mining data streams are concept drifting and data noise. Current algorithms mainly depend on the robust of the base classifier or learning ensembles, and have no active mechanisms to deal noisy. However, noise still can induce the drastic drops in accuracy. In this paper, we present a clustering-based method to filter out hard instances and noise instances from data streams. We also propose a trigger to detect concept drifting and build RobustBoosting, an ensemble classifier, by boosting the hard instances. We evaluated RobustBoosting algorithm and AdaptiveBoosting algorithm [1] on the synthetic and real-life data sets. The experiment results show that the proposed method has substantial advantage over AdaptiveBoosting algorithm in prediction accuracy, and that it can converge to target concepts efficiently with high accuracy on datasets with noise level as high as 40%.