On the optimality of probability estimation by random decision trees

Authors:
Wei Fan
Affiliations:
IBM T.J. Watson Research, Hawthorne, NY
Venue:
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Year:
2004

Citing 6
Cited 8

C4.5: programs for machine learning

C4.5: programs for machine learning
On the boosting ability of top-down decision tree learning algorithms

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Random Forests

Machine Learning
Pruning Decision Trees with Misclassification Costs

ECML '98 Proceedings of the 10th European Conference on Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Is random model better? On its accuracy and efficiency

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining

Systematic data selection to mine concept-drifting data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Effective Estimation of Posterior Probabilities: Explaining the Accuracy of Randomized Decision Tree Approaches

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Learning through Changes: An Empirical Study of Dynamic Behaviors of Probability Estimation Trees

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
StreamMiner: a classifier ensemble-based engine to mine concept-drifting data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Spectrum of variable-random trees

Journal of Artificial Intelligence Research
Random ensemble decision trees for learning concept-drifting data streams

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
A Practical Differentially Private Random Decision Tree Classifier

Transactions on Data Privacy

Quantified Score

Hi-index	0.00

Visualization

Abstract

Random decision tree is an ensemble of decision trees. The feature at any node of a tree in the ensemble is chosen randomly from remaining features. A chosen discrete feature on a decision path cannot be chosen again. Continuous feature can be chosen multiple times, however, with a different splitting value each time. During classification, each tree outputs raw posterior probability. The probabilities from each tree in the ensemble are averaged as the final posterior probability estimate. Although remarkably simple and somehow counter-intuitive, random decision tree has been shown to be highly accurate under 0-1 loss and cost-sensitive loss functions. Preliminary explanation of its high accuracy is due to the "error-tolerance" property of probabilistic decision making. Our study has shown that the actual reason for random tree's superior performance is due to its optimal approximation to each example's true probability to be a member of a given class.