A RANDOM DECISION TREE ENSEMBLE FOR MINING CONCEPT DRIFTS FROM NOISY DATA STREAMS

  • Authors:
  • Peipei Li;Xindong Wu;Xuegang Hu;Qianhui Liang;Yunjun Gao

  • Affiliations:
  • School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China;School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China,Department of Computer Science, University of Vermont, Vermont, USA;School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China;School of Information Systems, Singapore Management University, Singapore;College of Computer Science, Zhejiang University, China

  • Venue:
  • Applied Artificial Intelligence
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Detecting concept drifts and reducing the impact from the noise in real applications of data streams are challenging but valuable for inductive learning. It is especially a challenge in a light demand on the overheads of time and space. However, though a great number of inductive learning algorithms based on ensemble classification models have been proposed for handling concept drifting data streams, little attention has been focused on the detection of the diversity of concept drifts and the influence from noise in data streams simultaneously. Motivated by this, we present a new light-weighted inductive algorithm for concept drifting detection in virtue of an ensemble model of random decision trees (named CDRDT) to distinguish various types of concept drifts from noisy data streams in this article. We use variably small data chunks to generate random decision trees incrementally. Meanwhile, we introduce the inequality of Hoeffding bounds and the principle of statistical quality control to detect the different types of concept drifts and noise. Extensive studies on synthetic and real streaming data demonstrate that CDRDT could effectively and efficiently detect concept drifts from the noisy streaming data. Therefore, our algorithm provides a feasible reference framework of classification for concept drifting data streams with noise.