Robust ensemble learning for mining noisy data streams

  • Authors:
  • Peng Zhang;Xingquan Zhu;Yong Shi;Li Guo;Xindong Wu

  • Affiliations:
  • Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China;Centre for Quantum Computation & Intelligent Systems, University of Technology Sydney, Broadway, NSW 2007, Australia;Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing, China and College of Information Science & Technology, Univ. of Nebraska at Omaha, Omaha, NE 68182, US ...;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China;School of Computer Science & Information Eng., Hefei University of Technology, Hefei 230009, China and Department of Computer Science, University of Vermont, Burlington, VT 05405, USA

  • Venue:
  • Decision Support Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we study the problem of learning from concept drifting data streams with noise, where samples in a data stream may be mislabeled or contain erroneous values. Our essential goal is to build a robust prediction model from noisy stream data to accurately predict future samples. For noisy data sources, most existing works rely on data preprocessing techniques to cleanse noisy samples before the training of decision models. In data stream environments, these data preprocessing techniques are, unfortunately, hard to apply, mainly because the concept drifting in a data stream may make it very difficult to differentiate noise from samples of changing concepts. Accordingly, we propose an aggregate ensemble (AE) learning framework. The aim of AE is to build a robust ensemble model that can tolerate data errors. Theoretical and empirical studies on both synthetic and real-world data streams demonstrate that the proposed AE learning framework is capable of building accurate classification models from noisy data streams.