A Hellinger-based discretization method for numeric attributes in classification learning

Authors:
Chang-Hwan Lee
Affiliations:
Department of Information and Communications, DongGuk University, 3-26 Pil-Dong, Chung-Gu, Seoul 100-715, Republic of Korea
Venue:
Knowledge-Based Systems
Year:
2007

Citing 6
Cited 5

Maximizing the predictive value of production rules

Artificial Intelligence
C4.5: programs for machine learning

C4.5: programs for machine learning
A Modified Chi2 Algorithm for Discretization

IEEE Transactions on Knowledge and Data Engineering
The CN2 Induction Algorithm

Machine Learning
On Changing Continuous Attributes into Ordered Discrete Attributes

EWSL '91 Proceedings of the European Working Session on Machine Learning
On the best finite set of linear observables for discriminating two Gaussian signals

IEEE Transactions on Information Theory

Selection and optimization of cut-points for numeric attribute values

Computers & Mathematics with Applications
Analysis of the Effectiveness of the Genetic Algorithms based on Extraction of Association Rules

Fundamenta Informaticae - Intelligent Data Analysis in Granular Computing
Supporting scalable Bayesian networks using configurable discretizer actuators

ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
Review:

The Knowledge Engineering Review
QAR-CIP-NSGA-II: A new multi-objective evolutionary algorithm to mine quantitative association rules

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. Our method is context-sensitive in the sense that it takes into account the value of the target attribute. The amount of information each interval gives to the target attribute is measured using Hellinger divergence, and the interval boundaries are decided so that each interval contains as equal amount of information as possible. In order to compare our discretization method with some current discretization methods, several popular classification data sets are selected for discretization. We use naive Bayesian classifier and C4.5 as classification tools to compare the accuracy of our discretization method with that of other methods.