C4.5: programs for machine learning
C4.5: programs for machine learning
An Experimental and Theoretical Comparison of Model SelectionMethods
Machine Learning - Special issue on the eighth annual conference on computational learning theory, (COLT '95)
Estimating the expected error of empirical minimizers for model selection
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Expected Error Analysis for Model Selection
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Process-Oriented Estimation of Generalization Error
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Tractable Average-Case Analysis of Naive Bayesian Classifiers
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Finding Association Rules That Trade Support Optimally against Confidence
PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Average-Case Analysis of Classification Algorithms for Boolean Functions and Decision Trees
ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
Hi-index | 0.00 |
We discuss the problem of choosing the complexity of a decision tree (measured in the number of leaf nodes) that gives us highest generalization performance. We first discuss an analysis of the generalization error of decision trees that gives us a new perspective on the regularization parameter that is inherent to any regularization (e.g., pruning) algorithm. There is an optimal setting of this parameter for every learning problem; a setting that does well for one problem will inevitably do poorly for others. We will see that the optimal setting can in fact be estimated from the sample, without "trying out" various settings on holdout data. This leads us to a nonparametric decision tree regularization algorithm that can, in principle, work well for all learning problems.