CD: a coupled discretization algorithm

Authors:
Can Wang;Mingchun Wang;Zhong She;Longbing Cao
Affiliations:
Centre for Quantum Computation and Intelligent Systems Advanced Analytics Institute, University of Technology, Sydney, Australia;School of Science, Tianjin University of Technology and Education, China;Centre for Quantum Computation and Intelligent Systems Advanced Analytics Institute, University of Technology, Sydney, Australia;Centre for Quantum Computation and Intelligent Systems Advanced Analytics Institute, University of Technology, Sydney, Australia
Venue:
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Year:
2012

Citing 10
Cited 0

Rough sets: probabilistic versus deterministic approach

International Journal of Man-Machine Studies
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
Discretization of Continuous Attributes for Learning Classification Rules

PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
Rough Set-Based Clustering with Refinement Using Shannon's Entropy Theory

Computers & Mathematics with Applications
A Comparative Study of Algebra Viewpoint and Information Viewpoint in Attribute Reduction

Fundamenta Informaticae
Estimation of Market Share by Using Discretization Technology: An Application in China Mobile

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part II
Discretization for naive-Bayes learning: managing discretization bias and variance

Machine Learning
DTU: A Decision Tree for Uncertain Data

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Class confidence weighted kNN algorithms for imbalanced data sets

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Coupled nominal similarity in unsupervised learning

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discretization technique plays an important role in data mining and machine learning. While numeric data is predominant in the real world, many algorithms in supervised learning are restricted to discrete variables. Thus, a variety of research has been conducted on discretization, which is a process of converting the continuous attribute values into limited intervals. Recent work derived from entropy-based discretization methods, which has produced impressive results, introduces information attribute dependency to reduce the uncertainty level of a decision table; but no attention is given to the increment of certainty degree from the aspect of positive domain ratio. This paper proposes a discretization algorithm based on both positive domain and its coupling with information entropy, which not only considers information attribute dependency but also concerns deterministic feature relationship. Substantial experiments on extensive UCI data sets provide evidence that our proposed coupled discretization algorithm generally outperforms other seven existing methods and the positive domain based algorithm proposed in this paper, in terms of simplicity, stability, consistency, and accuracy.