C4.5: programs for machine learning
C4.5: programs for machine learning
The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
BOAT—optimistic decision tree construction
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Is random model better? On its accuracy and efficiency
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Accurate decision trees for mining high-speed data streams
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A semi-random multiple decision-tree algorithm for mining data streams
Journal of Computer Science and Technology
On Appropriate Assumptions to Mine Data Streams: Analysis and Practice
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
On the optimality of probability estimation by random decision trees
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Concept Drifting Detection on Noisy Streaming Data in Random Ensemble Decision Trees
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Random ensemble decision trees for learning concept-drifting data streams
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Mining Recurring Concept Drifts with Limited Labeled Streaming Data
ACM Transactions on Intelligent Systems and Technology (TIST)
Hi-index | 0.00 |
The induction error in random tree ensembling results mainly from the strength of decision trees and the dependency between base classifiers. In order to reduce the errors due to both factors, a Semi-Random Decision Tree Ensembling (SRDTE) for mining streaming data is proposed based on our previous work on SRMTDS. The model contains semi-random decision trees that are independent in the generation process and have no interaction with each other in the individual decisions of classification. The main idea is to minimize correlation among the classifiers. We claim that the strength of decision trees is closely related to the estimation values of the parameters, including the height of a tree, the count of trees and the parameter of n min in the Hoeffding Bounds. We analyze these parameters of the model and design strategies for better adaptation to streaming data. The main strategies include an incremental generation of sub-trees after seeing real training instances, a data structure for quick search and a voting mechanism for classification. Our evaluation in the 0-1 loss function shows that SRDTE has improved the performance in terms of predictive accuracy and robustness. We have applied SRDTE to e-business data streams and proved its feasibility and effectiveness.