C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Machine Learning
The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
MultiBoosting: A Technique for Combining Boosting and Wagging
Machine Learning
Randomizing Outputs to Increase Prediction Accuracy
Machine Learning
Machine Learning
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The Journal of Machine Learning Research
Is random model better? On its accuracy and efficiency
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Multistrategy Ensemble Learning: Reducing Error by Combining Ensemble Learning Techniques
IEEE Transactions on Knowledge and Data Engineering
Machine Learning
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
On the optimality of probability estimation by random decision trees
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Issues in stacked generalization
Journal of Artificial Intelligence Research
Constructing diverse classifier ensembles using artificial training examples
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Variable randomness in decision tree ensembles
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Maximizing tree diversity by building complete-random decision trees
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Learning with ensembles of randomized trees: new insights
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Isolation-Based Anomaly Detection
ACM Transactions on Knowledge Discovery from Data (TKDD)
The discovery and use of ordinal information on attribute values in classifier learning
AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
In this paper, we show that a continuous spectrum of randomisation exists, in which most existing tree randomisations are only operating around the two ends of the spectrum. That leaves a huge part of the spectrum largely unexplored. We propose a base learner VR-Tree which generates trees with variable-randomness. VR-Trees are able to span from the conventional deterministic trees to the complete-random trees using a probabilistic parameter. Using VR-Trees as the base models, we explore the entire spectrum of randomised ensembles, together with Bagging and Random Subspace. We discover that the two halves of the spectrum have their distinct characteristics; and the understanding of which allows us to propose a new approach in building better decision tree ensembles. We name this approach Coalescence, which coalesces a number of points in the random-half of the spectrum. Coalescence acts as a committee of "experts" to cater for unforeseeable conditions presented in training data. Coalescence is found to perform better than any single operating point in the spectrum, without the need to tune to a specific level of randomness. In our empirical study, Coalescence ranks top among the benchmarking ensemble methods including Random Forests, Random Subspace and C5 Boosting; and only Coalescence is significantly better than Bagging and Max-Diverse Ensemble among all the methods in the comparison. Although Coalescence is not significantly better than Random Forests, we have identified conditions under which one will perform better than the other.