Instability of Decision Tree Classification Algorithms

  • Authors:
  • Ruey Li

  • Affiliations:
  • -

  • Venue:
  • Instability of Decision Tree Classification Algorithms
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fundamental theorems are derived for the instability problem of decision tree classification algorithms. The instability problem of decision tree classification algorithms is that small changes in input trainingsamples may cause dramatically large changes in the output tree classifiers. The past research emphasized the instability of the prediction but not the tree structure change, which is more important to provide consistent, stable, and insightful information to facilitate the process of decision making. We present theorems to prove the relationship between a data change and the resulting tree structure change (i.e., split change). The relative sensitivity between two splits is defined based on the theorems as the smallest change that may cause the superior split to become inferior. A split is defined to be almost as good as another split if the relative sensitivity of the two splits is small. The Instability Theorem provides the cause of the instability problem. Algorithms are presented to lessen the instability problem. Empirical results illustrate that the trees constructed by the proposed algorithm are more stable, noise-tolerant, informative, expressive, and concise. The proposed sensitivity measure can be used as a metric to evaluate the stability of splitting predicates. The tree sensitivity is an indicator of the confidence level in rules and the effective lifetime of rules.