A semi-random multiple decision-tree algorithm for mining data streams

  • Authors:
  • Xue-Gang Hu;Pei-pei Li;Xin-Dong Wu;Gong-Qing Wu

  • Affiliations:
  • School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China;School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China;School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China and Department of Computer Science, University of Vermont, Burlington, VT;School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China

  • Venue:
  • Journal of Computer Science and Technology
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naïve Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.