Predicting Nearly As Well As the Best Pruning of a Decision Tree

  • Authors:
  • David P. Helmbold;Robert E. Schapire

  • Affiliations:
  • Computer and Information Sciences University of California Santa Cruz, CA 95064/ E-mail: dph@cse.ucsc.edu;AT&/T Labs 600 Mountain Avenue, Room 2A-424 Murray Hill, NJ 07974/ E-mail: schapire@research.att.com

  • Venue:
  • Machine Learning - Special issue on the eighth annual conference on computational learning theory, (COLT '95)
  • Year:
  • 1997

Quantified Score

Hi-index 0.07

Visualization

Abstract

Many algorithms for inferring a decision tree from data involve atwo-phase process: First, a very large decision tree is grown whichtypically ends up “over-fitting” the data. To reduceover-fitting, in the second phase, the tree is pruned using one of anumber of available methods. The final tree is then output and usedfor classification on test data.In this paper, we suggest analternative approach to the pruning phase. Using a given unpruneddecision tree, we present a new method of making predictions on testdata, and we prove that our algorithm‘s performance will not be“much worse” (in a precise technical sense) than thepredictions made by the best reasonably small pruning of the givendecision tree. Thus, our procedure is guaranteed to be competitive(in terms of the quality of its predictions) with {\it any} pruning algorithm. We prove that our procedure is veryefficient and highly robust.Our method can be viewed as asynthesis of two previously studied techniques. First, we applyCesa-Bianchi et al.‘s (1993) results on predicting using“expert advice” (where we view each pruning as an“expert”) to obtain an algorithm that has provably lowprediction loss, but that is computationally infeasible. Next, wegeneralize and apply a method developed by Buntine (1990, 1992) andWillems, Shtarkov and Tjalkens (1993, 1995) to derive a veryefficient implementation of this procedure.