Maximizing tree diversity by building complete-random decision trees

Authors:
Fei Tony Liu;Kai Ming Ting;Wei Fan
Affiliations:
School of Computing and Information Technology, Monash University Churchill, Victoria, Australia;School of Computing and Information Technology, Monash University Churchill, Victoria, Australia;IBM T.J. Waston Research, Hawthorne, NY
Venue:
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2005

Citing 9
Cited 6

C4.5: programs for machine learning

C4.5: programs for machine learning
Bagging predictors

Machine Learning
Shape quantization and recognition with randomized trees

Neural Computation
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Randomizing Outputs to Increase Prediction Accuracy

Machine Learning
Random Forests

Machine Learning
Bayesian Averaging of Classifiers and the Overfitting Problem

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Is random model better? On its accuracy and efficiency

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining

Effective Estimation of Posterior Probabilities: Explaining the Accuracy of Randomized Decision Tree Approaches

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Learning through Changes: An Empirical Study of Dynamic Behaviors of Probability Estimation Trees

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A maximally diversified multiple decision tree algorithm for microarray data classification

WISB '06 Proceedings of the 2006 workshop on Intelligent systems for bioinformatics - Volume 73
A semi-random multiple decision-tree algorithm for mining data streams

Journal of Computer Science and Technology
Spectrum of variable-random trees

Journal of Artificial Intelligence Research
Variable randomness in decision tree ensembles

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the ways to lower generalization error of decision tree ensemble is to maximize tree diversity. Building complete-random trees forgoes strength obtained from a test selection criterion. However, it achieves higher tree diversity. We provide a taxonomy of different randomization methods and find that complete-random test selection produces diverse trees and other randomization methods such as bootstrap sampling may impair tree growth and limit tree diversity. The well accepted practice in constructing decision trees is to apply bootstrap sampling and voting. To challenge this practice, we explore eight variants of complete-random trees using three parameters: ensemble methods, tree height restriction and sample randomization. Surprisingly, the most accurate variant is very simple and performs comparably to Bagging and Random Forests. It achieves good results by maximizing tree diversity and is called Max-diverse Ensemble.