Spectrum of variable-random trees

Authors:
Fei Tony Liu;Kai Ming Ting;Yang Yu;Zhi-Hua Zhou
Affiliations:
Gippsland School of Information Technology, Monash University, Australia;Gippsland School of Information Technology, Monash University, Australia;National Key Laboratory for Novel Software Technology, Nanjing University, China;National Key Laboratory for Novel Software Technology, Nanjing University, China
Venue:
Journal of Artificial Intelligence Research
Year:
2008

Citing 20
Cited 3

C4.5: programs for machine learning

C4.5: programs for machine learning
Stacked regressions

Machine Learning
Bagging predictors

Machine Learning
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
MultiBoosting: A Technique for Combining Boosting and Wagging

Machine Learning
Randomizing Outputs to Increase Prediction Accuracy

Machine Learning
Random Forests

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Stability and generalization

The Journal of Machine Learning Research
Is random model better? On its accuracy and efficiency

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Multistrategy Ensemble Learning: Reducing Error by Combining Ensemble Learning Techniques

IEEE Transactions on Knowledge and Data Engineering
Extremely randomized trees

Machine Learning
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
On the optimality of probability estimation by random decision trees

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Issues in stacked generalization

Journal of Artificial Intelligence Research
Constructing diverse classifier ensembles using artificial training examples

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Variable randomness in decision tree ensembles

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Maximizing tree diversity by building complete-random decision trees

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Learning with ensembles of randomized trees: new insights

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Isolation-Based Anomaly Detection

ACM Transactions on Knowledge Discovery from Data (TKDD)
The discovery and use of ordinal information on attribute values in classifier learning

AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we show that a continuous spectrum of randomisation exists, in which most existing tree randomisations are only operating around the two ends of the spectrum. That leaves a huge part of the spectrum largely unexplored. We propose a base learner VR-Tree which generates trees with variable-randomness. VR-Trees are able to span from the conventional deterministic trees to the complete-random trees using a probabilistic parameter. Using VR-Trees as the base models, we explore the entire spectrum of randomised ensembles, together with Bagging and Random Subspace. We discover that the two halves of the spectrum have their distinct characteristics; and the understanding of which allows us to propose a new approach in building better decision tree ensembles. We name this approach Coalescence, which coalesces a number of points in the random-half of the spectrum. Coalescence acts as a committee of "experts" to cater for unforeseeable conditions presented in training data. Coalescence is found to perform better than any single operating point in the spectrum, without the need to tune to a specific level of randomness. In our empirical study, Coalescence ranks top among the benchmarking ensemble methods including Random Forests, Random Subspace and C5 Boosting; and only Coalescence is significantly better than Bagging and Max-Diverse Ensemble among all the methods in the comparison. Although Coalescence is not significantly better than Random Forests, we have identified conditions under which one will perform better than the other.