Using Resampling Techniques for Better Quality Discretization

Authors:
Taimur Qureshi;Djamel A. Zighed
Affiliations:
Laboratory ERIC, University of Lyon 2, Bron Cedex, France 69676;Laboratory ERIC, University of Lyon 2, Bron Cedex, France 69676
Venue:
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2009

Citing 8
Cited 1

On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
FUSINTER: a method for discretization of continuous attributes

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Investigation and Reduction of Discretization Variance in Decision Tree Induction

ECML '00 Proceedings of the 11th European Conference on Machine Learning
Why Discretization Works for Naive Bayesian Classifiers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
MODL: A Bayes optimal discretization method for continuous attributes

Machine Learning
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research

Dynamic discreduction using Rough Sets

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many supervised induction algorithms require discrete data, however real data often comes in both discrete and continuous formats. Quality discretization of continuous attributes is an important problem that has effects on accuracy, complexity, variance and understandability of the induction model. Usually, discretization and other types of statistical processes are applied to subsets of the population as the entire population is practically inaccessible. For this reason we argue that the discretization performed on a sample of the population is only an estimate of the entire population. Most of the existing discretization methods, partition the attribute range into two or several intervals using a single or a set of cut points. In this paper, we introduce two variants of a resampling technique (such as bootstrap) to generate a set of candidate discretization points and thus, improving the discretization quality by providing a better estimation towards the entire population. Thus, the goal of this paper is to observe whether this type of resampling can lead to better quality discretization points, which opens up a new paradigm to construction of soft decision trees.