C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Shape quantization and recognition with randomized trees
Neural Computation
The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
Randomizing Outputs to Increase Prediction Accuracy
Machine Learning
Machine Learning
Bayesian Averaging of Classifiers and the Overfitting Problem
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Is random model better? On its accuracy and efficiency
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Learning through Changes: An Empirical Study of Dynamic Behaviors of Probability Estimation Trees
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A maximally diversified multiple decision tree algorithm for microarray data classification
WISB '06 Proceedings of the 2006 workshop on Intelligent systems for bioinformatics - Volume 73
A semi-random multiple decision-tree algorithm for mining data streams
Journal of Computer Science and Technology
Spectrum of variable-random trees
Journal of Artificial Intelligence Research
Variable randomness in decision tree ensembles
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Hi-index | 0.00 |
One of the ways to lower generalization error of decision tree ensemble is to maximize tree diversity. Building complete-random trees forgoes strength obtained from a test selection criterion. However, it achieves higher tree diversity. We provide a taxonomy of different randomization methods and find that complete-random test selection produces diverse trees and other randomization methods such as bootstrap sampling may impair tree growth and limit tree diversity. The well accepted practice in constructing decision trees is to apply bootstrap sampling and voting. To challenge this practice, we explore eight variants of complete-random trees using three parameters: ensemble methods, tree height restriction and sample randomization. Surprisingly, the most accurate variant is very simple and performs comparably to Bagging and Random Forests. It achieves good results by maximizing tree diversity and is called Max-diverse Ensemble.