Post-pruning in regression tree induction: An integrated approach

  • Authors:
  • Kweku-Muata Osei-Bryson

  • Affiliations:
  • Department of Information Systems and The Information Systems Research Institute, Virginia Commonwealth University, Richmond, VA 23284, United States

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2008

Quantified Score

Hi-index 12.05

Visualization

Abstract

The regression tree (RT) induction process has two major phases: the growth phase and the pruning phase. The pruning phase aims to generalize the RT that was generated in the growth phase by generating a subtree that avoids over-fitting to the training data. Most post-pruning methods essentially address post-pruning as if it were a single objective problem (i.e., maximize validation accuracy), and address the issue of simplicity (in terms of the number of leaves) only in the case of a tie. However, it is well known that apart from accuracy there are other performance measures (e.g., stability, simplicity) that are important for evaluating DT quality. In this paper we present an integrated approach to post-pruning phase that simultaneously accommodates multiple performance measures that are important for evaluating RT quality, and obtains the optimal subtree based on user provided preference and value function information.