Nonparametric Regularization of Decision Trees

Authors:
Tobias Scheffer
Affiliations:
-
Venue:
ECML '00 Proceedings of the 11th European Conference on Machine Learning
Year:
2000

Citing 8
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
An Experimental and Theoretical Comparison of Model SelectionMethods

Machine Learning - Special issue on the eighth annual conference on computational learning theory, (COLT '95)
Estimating the expected error of empirical minimizers for model selection

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
An Empirical Comparison of Selection Measures for Decision-Tree Induction

Machine Learning
A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Expected Error Analysis for Model Selection

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Process-Oriented Estimation of Generalization Error

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Tractable Average-Case Analysis of Naive Bayesian Classifiers

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning

Finding Association Rules That Trade Support Optimally against Confidence

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Average-Case Analysis of Classification Algorithms for Boolean Functions and Decision Trees

ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss the problem of choosing the complexity of a decision tree (measured in the number of leaf nodes) that gives us highest generalization performance. We first discuss an analysis of the generalization error of decision trees that gives us a new perspective on the regularization parameter that is inherent to any regularization (e.g., pruning) algorithm. There is an optimal setting of this parameter for every learning problem; a setting that does well for one problem will inevitably do poorly for others. We will see that the optimal setting can in fact be estimated from the sample, without "trying out" various settings on holdout data. This leads us to a nonparametric decision tree regularization algorithm that can, in principle, work well for all learning problems.