Trading Accuracy for Simplicity in Decision Trees

  • Authors:
  • Marko Bohanec;Ivan Bratko

  • Affiliations:
  • “Jožef Stefan” Institute, Jamova 39, SI-61111 Ljubljana, Slovenia. MARKO.BOHANEC@IJS.SI;University of Ljubljana, Faculty of Electrical and Computer Engineering, Tržaška 25, SI-61000 Ljubljana, Slovenia. IVAN.BRATKO@NINURTA.FER.UNI-LJ.SI

  • Venue:
  • Machine Learning
  • Year:
  • 1994

Quantified Score

Hi-index 0.01

Visualization

Abstract

When communicating concepts, it is often convenient or even necessary to define a concept approximately. A simple, although only approximately accurate concept definition may be more useful than a completely accurate definition which involves a lot of detail. This paper addresses the problem: given a completely accurate, but complex, definition of a concept, simplify the definition, possibly at the expense of accuracy, so that the simplified definition still corresponds to the concept “sufficiently” well. Concepts are represented by decision trees, and the method of simplification is tree pruning. Given a decision tree that accurately specifies a concept, the problem is to find a smallest pruned tree that still represents the concept within some specified accuracy. A pruning algorithm is presented that finds an optimal solution by generating a dense sequence of pruned trees, decreasing in size, such that each tree has the highest accuracy among all the possible pruned trees of the same size. An efficient implementation of the algorithm, based on dynamic programming, is presented and empirically compared with three progressive pruning algorithms using both artificial and real-world data. An interesting empirical finding is that the real-world data generally allow significantly greater simplification at equal loss of accuracy.