Instability of decision tree classification algorithms

  • Authors:
  • Ruey-Hsia Li;Geneva G. Belford

  • Affiliations:
  • Lightspeed Semiconductor, Sunnyvale, CA;University of Illinois at Urbana-Champaign, Urbana, IL

  • Venue:
  • Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The instability problem of decision tree classification algorithms is that small changes in input training samples may cause dramatically large changes in output classification rules. Different rules generated from almost the same training samples are against human intuition and complicate the process of decision making. In this paper, we present fundamental theorems for the instability problem of decision tree classifiers. The first theorem gives the relationship between a data change and the resulting tree structure change (i.e. split change). The second theorem, Instability Theorem, provides the cause of the instability problem. Based on the two theorems, algorithmic improvements can be made to lessen the instability problem. Empirical results illustrate the theorem statements. The trees constructed by the proposed algorithm are more stable, noise-tolerant, informative, expressive, and concise. Our proposed sensitivity measure can be used as a metric to evaluate the stability of splitting predicates. The tree sensitivity is an indicator of the confidence level in rules and the effective lifetime of rules.