Maximum a posteriori pruning on decision trees and its application to bootstrap BUMPing

Authors:
Jinseog Kim;Yongdai Kim
Affiliations:
Statistical Research Center for Complex Systems, Seoul National University, Seoul 151-742, Republic of Korea;Department of Statistics, Seoul National University, Seoul 151-742, Republic of Korea
Venue:
Computational Statistics & Data Analysis
Year:
2006

Citing 7
Cited 0

Learning decision rules in noisy domains

Proceedings of Expert Systems '86, The 6Th Annual Technical Conference on Research and development in expert systems III
Simplifying decision trees

International Journal of Man-Machine Studies - Special Issue: Knowledge Acquisition for Knowledge-based Systems. Part 5
Inferring decision trees using the minimum description length principle

Information and Computation
Bagging predictors

Machine Learning
On Estimating Probabilities in Tree Pruning

EWSL '91 Proceedings of the European Working Session on Machine Learning
Paper: Modeling by shortest data description

Automatica (Journal of IFAC)
Characterization of the Bayes estimator and the MDL estimator for exponential families

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.03

Visualization

Abstract

The cost-complexity pruning generates nested subtrees and selects the best one. However, its computational cost is large since it uses holdout sample or cross-validation. On the other hand, the pruning algorithms based on posterior calculations such as BIC (MDL) and MEP are faster, but they sometimes produce too big or small trees to yield poor generalization errors. In this paper, we propose an alternative pruning procedure which combines the ideas of the cost-complexity pruning and posterior calculation. The proposed algorithm uses only training samples, so that its computational cost is almost same as the other posterior-based algorithms, and at the same time yields similar accuracies as the cost-complexity pruning. Moreover it can be used for comparing non-nested trees, which is necessary for the BUMPing procedure. The empirical results show that the proposed algorithm performs similarly as the cost-complexity pruning in standard situations and works better for BUMPing.