Discretizing continuous attributes using information theory

Authors:
Chang-Hwan Lee
Affiliations:
Department of Information and Communications, DongGuk University, Seoul, Korea
Venue:
ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Year:
2005

Citing 4
Cited 0

Maximizing the predictive value of production rules

Artificial Intelligence
C4.5: programs for machine learning

C4.5: programs for machine learning
On Changing Continuous Attributes into Ordered Discrete Attributes

EWSL '91 Proceedings of the European Working Session on Machine Learning
Khiops: A Statistical Discretization Method of Continuous Attributes

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. The amount of information each interval gives to the target attribute is measured using Hellinger divergence, and the interval boundaries are decided so that each interval contains as equal amount of information as possible. In order to compare our discretization method with some current discretization methods, several popular classification data sets are selected for discretization. We use naive Bayesian classifier and C4.5 as classification tools to compare the accuracy of our discretization method with that of other methods.