Improved Algorithms for Univariate Discretization of Continuous Features

Authors:
Jussi Kujala;Tapio Elomaa
Affiliations:
Institute of Software Systems, Tampere University of Technology, P.O. Box 553, FI-33101 Tampere, Finland;Institute of Software Systems, Tampere University of Technology, P.O. Box 553, FI-33101 Tampere, Finland
Venue:
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2007

Citing 12
Cited 2

Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Elements of information theory

Elements of information theory
On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
On Changing Continuous Attributes into Ordered Discrete Attributes

EWSL '91 Proceedings of the European Working Session on Machine Learning
On Finding Optimal Discretizations for Two Attributes

RSCTC '98 Proceedings of the First International Conference on Rough Sets and Current Trends in Computing
Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers

Machine Learning
Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates

Data Mining and Knowledge Discovery
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Approximation algorithms for minimizing empirical error by axis-parallel hyperplanes

ECML'05 Proceedings of the 16th European conference on Machine Learning

Ranking the Uniformity of Interval Pairs

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Review:

The Knowledge Engineering Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

In discretization of a continuous variable its numerical value range is divided into a few intervals that are used in classification. For example, Naïve Bayes can benefit from this processing. A commonly-used supervised discretization method is Fayyad and Irani's recursive entropy-based splitting of a value range. The technique uses ent-mdlas a model selection criterion to decide whether to accept the proposed split.We argue that theoretically the method is not always close to ideal for this application. Empirical experiments support our finding. We give a statistical rule that does not use the ad-hoc rule of Fayyad and Irani's approach to increase its performance. This rule, though, is quite time consuming to compute. We also demonstrate that a very simple Bayesian method performs better than ent-mdlas a model selection criterion.