An Efficient Method for Discretizing Continuous Attributes

  • Authors:
  • Kelley M. Engle;Aryya Gangopadhyay

  • Affiliations:
  • University of Maryland Baltimore County, USA;University of Maryland Baltimore County, USA

  • Venue:
  • International Journal of Data Warehousing and Mining
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper the authors present a novel method for finding optimal split points for discretization of continuous attributes. Such a method can be used in many data mining techniques for large databases. The method consists of two major steps. In the first step search space is pruned using a bisecting region method that partitions the search space and returns the point with the highest information gain based on its search. The second step consists of a hill climbing algorithm that starts with the point returned by the first step and greedily searches for an optimal point. The methods were tested using fifteen attributes from two data sets. The results show that the method reduces the number of searches drastically while identifying the optimal or near-optimal split points. On average, there was a 98% reduction in the number of information gain calculations with only 4% reduction in information gain.