Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data
IEEE Transactions on Pattern Analysis and Machine Intelligence
On changing continuous attributes into ordered discrete attributes
EWSL-91 Proceedings of the European working session on learning on Machine learning
C4.5: programs for machine learning
C4.5: programs for machine learning
Bottom-up induction of oblivious read-once decision graphs: strengths and limitations
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Discretization: An Enabling Technique
Data Mining and Knowledge Discovery
Feature Selection via Discretization
IEEE Transactions on Knowledge and Data Engineering
A Modified Chi2 Algorithm for Discretization
IEEE Transactions on Knowledge and Data Engineering
Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data
IEEE Transactions on Pattern Analysis and Machine Intelligence
Class-Driven Statistical Discretization of Continuous Attributes (Extended Abstract)
ECML '95 Proceedings of the 8th European Conference on Machine Learning
IEEE Transactions on Knowledge and Data Engineering
Khiops: A Statistical Discretization Method of Continuous Attributes
Machine Learning
Guest Editors' Introduction: Information Enhancement for Data Mining
IEEE Intelligent Systems
A Mathematical Theory of Communication
A Mathematical Theory of Communication
Data pre-processing: a new algorithm for feature selection and data discretization
CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
A bottom-up approach to discover transition rules of cellular automata using ant intelligence
International Journal of Geographical Information Science
A novel Chi2 algorithm for discretization of continuous attributes
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
The Knowledge Engineering Review
An effective discretization based on Class-Attribute Coherence Maximization
Pattern Recognition Letters
An ICA-Based multivariate discretization algorithm
KSEM'06 Proceedings of the First international conference on Knowledge Science, Engineering and Management
Hi-index | 0.00 |
Discretization, as a preprocessing step for data mining, is a process of converting the continuous attributes of a data set into discrete ones so that they can be treated as the nominal features by machine learning algorithms. Those various discretization methods, that use entropy-based criteria, form a large class of algorithm. However, as a measure of class homogeneity, entropy cannot always accurately reflect the degree of class homogeneity of an interval. Therefore, in this paper, we propose a new measure of class heterogeneity of intervals from the viewpoint of class probability itself. Based on the definition of heterogeneity, we present a new criterion to evaluate a discretization scheme and analyze its property theoretically. Also, a heuristic method is proposed to find the approximate optimal discretization scheme. Finally, our method is compared, in terms of predictive error rate and tree size, with Ent-MDLC, a representative entropy-based discretization method well-known for its good performance. Our method is shown to produce better results than those of Ent-MDLC, although the improvement is not significant. It can be a good alternative to entropy-based discretization methods.