Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data
IEEE Transactions on Pattern Analysis and Machine Intelligence
Elements of information theory
Elements of information theory
C4.5: programs for machine learning
C4.5: programs for machine learning
Feature Selection for Knowledge Discovery and Data Mining
Feature Selection for Knowledge Discovery and Data Mining
Machine Learning
On Changing Continuous Attributes into Ordered Discrete Attributes
EWSL '91 Proceedings of the European Working Session on Machine Learning
Semi-supervised learning for mixed-type data via formal concept analysis
ICCS'11 Proceedings of the 19th international conference on Conceptual structures for discovering knowledge
Semi-supervised learning on closed set lattices
Intelligent Data Analysis
Hi-index | 0.00 |
Quantization of continuous variables is important in data analysis, especially for some model classes such as Bayesian networks and decision trees, which use discrete variables. Often, the discretization is based on the distribution of the input variables only whereas additional information, for example in form of class membership is frequently present and could be used to improve the quality of the results. In this paper, quantization methods based on equal width interval, maximum entropy, maximum mutual information and the novel approach based on maximum mutual information combined with entropy are considered. The two former approaches do not take the class membership into account whereas the two latter approaches do. The relative merits of each method are compared in an empirical setting, where results are shown for two data sets in a direct marketing problem, and the quality of quantization is measured by mutual information and the performance of Naive Bayes and C5 decision tree classifiers.