Many are better than one: improving probabilistic estimates from decision trees

Authors:
Nitesh V. Chawla
Affiliations:
Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN
Venue:
MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment
Year:
2005

Citing 20
Cited 1

Simplifying decision trees

International Journal of Man-Machine Studies - Special Issue: Knowledge Acquisition for Knowledge-based Systems. Part 5
Logical versus analogical or symbolic versus connectionist or neat versus scruffy

AI Magazine
C4.5: programs for machine learning

C4.5: programs for machine learning
Bagging predictors

Machine Learning
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Learning and making decisions when costs and probabilities are both unknown

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy

Machine Learning
Bayes and Pseudo-Bayes Estimates of Conditional Probabilities and Their Reliability

ECML '93 Proceedings of the European Conference on Machine Learning
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Tree Induction for Probability-Based Ranking

Machine Learning
Bagging in Computer Vision

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
The class imbalance problem: A systematic study

Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Bagging, boosting, and C4.S

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Compact ensemble trees for imbalanced data

MCS'11 Proceedings of the 10th international conference on Multiple classifier systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decision trees, a popular choice for classification, have their limitation in providing probability estimates, requiring smoothing at the leaves. Typically, smoothing methods such as Laplace or m-estimate are applied at the decision tree leaves to overcome the systematic bias introduced by the frequency-based estimates. In this work, we show that an ensemble of decision trees significantly improves the quality of the probability estimates produced at the decision tree leaves. The ensemble overcomes the myopia of the leaf frequency based estimates. We show the effectiveness of the probabilistic decision trees as a part of the Predictive Uncertainty Challenge. We also include three additional highly imbalanced datasets in our study. We show that the ensemble methods significantly improve not only the quality of the probability estimates but also the AUC for the imbalanced datasets.