A comparison of six approaches to discretization: a rough set perspective

Authors:
Piotr Blajdo;Jerzy W. Grzymala-Busse;Zdzislaw S. Hippe;Maksymilian Knap;Teresa Mroczek;Lukasz Piatek
Affiliations:
Department of Expert Systems and Artificial Intelligence, University of Information Technology and Management, Rzeszow, Poland;Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS and Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland;Department of Expert Systems and Artificial Intelligence, University of Information Technology and Management, Rzeszow, Poland;Department of Expert Systems and Artificial Intelligence, University of Information Technology and Management, Rzeszow, Poland;Department of Distributed Systems, University of Information Technology and Management, Rzeszow, Poland;Department of Distributed Systems, University of Information Technology and Management, Rzeszow, Poland
Venue:
RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
Year:
2008

Citing 5
Cited 3

On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
A new version of the rule induction system LERS

Fundamenta Informaticae
Data reduction: discretization of numerical attributes

Handbook of data mining and knowledge discovery
Cluster Analysis

Cluster Analysis

An Extended Comparison of Six Approaches to Discretization - A Rough Set Approach

Fundamenta Informaticae - Fundamentals of Knowledge Technology
A Multiple Scanning Strategy for Entropy Based Discretization

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
An Extended Comparison of Six Approaches to Discretization - A Rough Set Approach

Fundamenta Informaticae - Fundamentals of Knowledge Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present results of extensive experiments performed on nine data sets with numerical attributes using six promising discretization methods. For every method and every data set 30 experiments of ten-fold cross validation were conducted and then means and sample standard deviations were computed. Our results show that for a specific data set it is essential to choose an appropriate discretization method since performance of discretization methods differ significantly. However, in general, among all of these discretization methods there is no statistically significant worst or best method. Thus, in practice, for a given data set the best discretization method should be selected individually.