On changing continuous attributes into ordered discrete attributes
EWSL-91 Proceedings of the European working session on learning on Machine learning
C4.5: programs for machine learning
C4.5: programs for machine learning
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Forming Categories in Exploratory Data Analysis and Data Mining
IDA '97 Proceedings of the Second International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data
Effects of discretization on determination of coronary artery disease using support vector machine
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Artificial Intelligence Based Green Technology Retrofit for Misfire Detection in Old Engines
International Journal of Green Computing
Artificial Intelligence in Medicine
Compact classification of optimized Boolean reasoning with Particle Swarm Optimization
Intelligent Data Analysis
Hi-index | 0.01 |
This study attempts to identify the merits of six of the most popular discretization methods when confronted with a randomly generated dataset consisting of attributes that conform to one of eight common statistical distributions. It is hoped that the analysis will enlighten as to a heuristic which identifies the most appropriate discretization method to be applied, given some preliminary analysis or visualization to determine the type of statistical distribution of the attribute to be discretized. Further, the comparative effectiveness of discretization given each data distribution is a primary focus. Analysis of the data was accomplished by inducing a decision tree classifier (C4.5) on the discretized data and an error measure was used to determine the relative value of discretization. The experiments showed that the method of discretization and the level of inherent error placed in the class attribute has a major impact on classification errors generated post-discretization. More importantly, the general effectiveness of discretization varies significantly depending on the shape of data distribution considered. Distributions that are highly skewed or have high peaks tend to result in higher classification errors, and the relative superiority of supervised discretization over unsupervised discretization is diminished significantly when applied to these data distributions.