Minimum splits based discretization for continuous features

  • Authors:
  • Ke Wang;Han Chong Goh

  • Affiliations:
  • Dept. of Information Systems and Computer Science, National University of Singapore, Singapore;Dept. of Information Systems and Computer Science, National University of Singapore, Singapore

  • Venue:
  • IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Discretization refers to splitting the range of continuous values into intervals so as to provide useful information about classes. This is usually done by minimizing a goodness measure, subject to constraints such as the maximal number of intervals, the minimal number of examples per interval, or some stopping criterion for splitting. We take a different approach by searching for minimum splits that minimize the number of intervals with respect to a threshold of impurity (i.e., badness). We propose a "total entropy" motivated selection of the "best" split from minimum splits, without requiring additional constraints. Experiments show that the proposed method produces better decision trees.