A semi-random multiple decision-tree algorithm for mining data streams

Authors:
Xue-Gang Hu;Pei-pei Li;Xin-Dong Wu;Gong-Qing Wu
Affiliations:
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China;School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China;School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China and Department of Computer Science, University of Vermont, Burlington, VT;School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
Venue:
Journal of Computer Science and Technology
Year:
2007

Citing 19
Cited 4

Efficient incremental induction of decision trees

Machine Learning
Shape quantization and recognition with randomized trees

Neural Computation
Decision Tree Induction Based on Efficient Tree Restructuring

Machine Learning
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Decision tree classification of spatial data streams using Peano Count Trees

Proceedings of the 2002 ACM symposium on Applied computing
Random Forests

Machine Learning
Better streaming algorithms for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Clustering binary data streams with K-means

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Is random model better? On its accuracy and efficiency

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Finding recent frequent itemsets adaptively over online data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate decision trees for mining high-speed data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Sequential Pattern Mining in Multiple Streams

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Mining maximal frequent itemsets from data streams

Journal of Information Science
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Maximizing tree diversity by building complete-random decision trees

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Mining Concept-Drifting Data Streams with Multiple Semi-Random Decision Trees

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
PGG: an online pattern based approach for stream variation management

Journal of Computer Science and Technology
Concept Drifting Detection on Noisy Streaming Data in Random Ensemble Decision Trees

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naïve Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.