Post-pruning in decision tree induction using multiple performance measures

  • Authors:
  • Kweku-Muata Osei-Bryson

  • Affiliations:
  • Department of Information Systems, The Information Systems Research Institute, Virginia Commonwealth University, Richmond, VA 23284, USA

  • Venue:
  • Computers and Operations Research
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

The decision tree (DT) induction process has two major phases: the growth phase and the pruning phase. The pruning phase aims to generalize the DT that was generated in the growth phase by generating a sub-tree that avoids over-fitting to the training data. Most post-pruning methods essentially address post-pruning as if it were a single objective problem (i.e. maximize validation accuracy), and address the issue of simplicity (in terms of the number of leaves) only in the case of a tie. However, it is well known that apart from accuracy there are other performance measures (e.g. stability, simplicity, interpretability) that are important for evaluating DT quality. In this paper, we propose that multi-objective evaluation be done during the post-pruning phase in order to select the best sub-tree, and propose a procedure for obtaining the optimal sub-tree based on user provided preference and value function information.