Effective Estimation of Posterior Probabilities: Explaining the Accuracy of Randomized Decision Tree Approaches

Authors:
Wei Fan;Ed Greengrass;Joe McCloskey;Philip S. Yu;Kevin Drummey
Affiliations:
IBM T.J.Watson Research;US Department of Defense;US Department of Defense;IBM T.J.Watson Research;US Department of Defense
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 13
Cited 5

Neural networks and the bias/variance dilemma

Neural Computation
Bagging predictors

Machine Learning
Shape quantization and recognition with randomized trees

Neural Computation
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Randomizing Outputs to Increase Prediction Accuracy

Machine Learning
Machine Learning

Machine Learning
Random Forests

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Transforming classifier scores into accurate multiclass probability estimates

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Tree Induction for Probability-Based Ranking

Machine Learning
Is random model better? On its accuracy and efficiency

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
On the optimality of probability estimation by random decision trees

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Maximizing tree diversity by building complete-random decision trees

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

A general framework for accurate and fast regression by data summarization in random decision trees

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Challenges and experience in prototyping a multi-modal stream analytic and monitoring application on System S

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A Practical Differentially Private Random Decision Tree Classifier

Transactions on Data Privacy
Statistical cross-language Web content quality assessment

Knowledge-Based Systems
Combining supervised and unsupervised models via unconstrained probabilistic embedding

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been increasing number of independently proposed randomization methods in different stages of decision tree construction to build multiple trees. Randomized decision tree methods have been reported to be significantly more accurate than widely-accepted single decision trees, although the training procedure of some methods incorporates a surprisingly random factor and therefore opposes the generally accepted idea of employing gain functions to choose optimum features at each node and compute a single tree that fits the data. One important question that is not well understood yet is the reason behind the high accuracy. We provide an insight based on posterior probability estimations. We first establish the relationship between effective posterior probability estimation and effective loss reduction. We argue that randomized decision tree methods effectively approximate the true probability distribution using the decision tree hypothesis space. We conduct experiments using both synthetic and real-world datasets under both 0-1 and cost-sensitive loss functions.