A global optimal algorithm for class-dependent discretization of continuous data

  • Authors:
  • Lili Liu;Andrew K. C. Wong;Yang Wang

  • Affiliations:
  • PAMI Lab, Department of Systems Design Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. E-mail: {lililiu, akcwong}@pami.uwaterloo.ca;PAMI Lab, Department of Systems Design Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. E-mail: {lililiu, akcwong}@pami.uwaterloo.ca;Pattern Discovery Software Systems, Ltd., 550 Parkside Drive, Unit B9, Waterloo, Ontario N2L 5V4, Canada. E-mail: yang@patterndiscovery.com

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new method to convert continuous variables into discrete variables for inductive machine learning. The method can be applied to pattern classification problems in machine learning and data mining. The discretization process is formulated as an optimization problem. We first use the normalized mutual information that measures the interdependence between the class labels and the variable to be discretized as the objective function, and then use fractional programming (iterative dynamic programming) to find its optimum. Unlike the majority of class-dependent discretization methods in the literature which only find the local optimum of the objective functions, the proposed method, OCDD, or Optimal Class-Dependent Discretization, finds the global optimum. The experimental results demonstrate that this algorithm is very effective in classification when coupled with popular learning systems such as C4.5 decision trees and Naive-Bayes classifier. It can be used to discretize continuous variables for many existing inductive learning systems.