Cleansing Noisy Data Streams

Authors:
Xingquan Zhu;Peng Zhang;Xindong Wu;Dan He;Chengqi Zhang;Yong Shi
Affiliations:
-;-;-;-;-;-
Venue:
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Year:
2008

Citing 0
Cited 5

Robust ensemble learning for mining noisy data streams

Decision Support Systems
Active learning from stream data using optimal weight classifier ensemble

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Enabling fast prediction for ensemble models on data streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for application-driven classification of data streams

Neurocomputing
Information enhancement for data mining

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we identify a new research problem on cleansing noisy data streams which contain incorrectly labeled training examples. The objective is to accurately identify and remove mislabeled data, such that the prediction models built from the cleansed streams can be more accurate than the ones trained from the raw noisy streams. For this purpose, we first use bias-variance decomposition to derive a maximum variance margin (MVM) principle for stream data cleansing. Following this principle, we further propose a local and global filtering (LgF) framework to combine the strength of local noise filtering (within one single data chunk) and global noise filtering (across a number of adjacent data chunks) to identify erroneous data. Experimental results on six data streams (including two real-world data streams) demonstrate that LgF significantly outperforms simple methods in identifying noisy examples.