Mining Concept-Drifting Data Streams with Multiple Semi-Random Decision Trees

  • Authors:
  • Peipei Li;Xuegang Hu;Xindong Wu

  • Affiliations:
  • School of Computer and Information, Hefei University of Technology, Hefei, China 230009;School of Computer and Information, Hefei University of Technology, Hefei, China 230009;School of Computer and Information, Hefei University of Technology, Hefei, China 230009 and Department of Computer Science, University of Vermont, Burlington, U.S.A VT 50405

  • Venue:
  • ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification with concept-drifting data streams has found wide applications. However, many classification algorithms on streaming data have been designed for fixed features of concept drift and cannot deal with the noise impact on concept drift detection. An incremental algorithm with Multiple Semi- Random decision Trees (MSRT) for concept-drifting data streams is presented in this paper, which takes two sliding windows for training and testing, uses the inequality of Hoeffding Bounds to determine the thresholds for distinguishing the true drift from noise, and chooses the classification function to estimate the error rate for periodic concept-drift detection. Our extensive empirical study shows that MSRT has an improved performance in time, accuracy and robustness in comparison with CVFDT, a state-of-the-art decision-tree algorithm for classifying concept-drifting data streams.