Nonparametric Regularization of Decision Trees

  • Authors:
  • Tobias Scheffer

  • Affiliations:
  • -

  • Venue:
  • ECML '00 Proceedings of the 11th European Conference on Machine Learning
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

We discuss the problem of choosing the complexity of a decision tree (measured in the number of leaf nodes) that gives us highest generalization performance. We first discuss an analysis of the generalization error of decision trees that gives us a new perspective on the regularization parameter that is inherent to any regularization (e.g., pruning) algorithm. There is an optimal setting of this parameter for every learning problem; a setting that does well for one problem will inevitably do poorly for others. We will see that the optimal setting can in fact be estimated from the sample, without "trying out" various settings on holdout data. This leads us to a nonparametric decision tree regularization algorithm that can, in principle, work well for all learning problems.