A global unsupervised data discretization algorithm based on collective correlation coefficient

Authors:
An Zeng;Qi-Gang Gao;Dan Pan
Affiliations:
Guangdong University of Technology and Dalhousie University;Dalhousie University;Saint Mary's University
Venue:
IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Year:
2011

Citing 11
Cited 0

Flocks, herds and schools: A distributed behavioral model

SIGGRAPH '87 Proceedings of the 14th annual conference on Computer graphics and interactive techniques
On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
CAIM Discretization Algorithm

IEEE Transactions on Knowledge and Data Engineering
Toward Unsupervised Correlation Preserving Discretization

IEEE Transactions on Knowledge and Data Engineering
A Fuzzy Approach to Partitioning Continuous Attributes for Classification

IEEE Transactions on Knowledge and Data Engineering
Knowledge Acquisition Based on Rough Set Theory and Principal Component Analysis

IEEE Intelligent Systems
A discretization algorithm based on Class-Attribute Contingency Coefficient

Information Sciences: an International Journal
Discretization for naive-Bayes learning: managing discretization bias and variance

Machine Learning
A Non-parametric Semi-supervised Discretization Method

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
On biases in estimating multi-valued attributes

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data discretization is an important task for certain types of data mining algorithms such as association rule discovery and Bayesian learning. For those algorithms, proper discretization not only can significantly improve the quality and understandability of discovered knowledge, but also can reduce the running time. We present a Global Unsupervised Discretization Algorithm based on Collective Correlation Coefficient (GUDA-CCC) that provides the following attractive merits. 1) It does not require class labels from training data. 2) It preserves the ranks of attribute importance in a data set and meanwhile minimizes the information loss measured by mean square error. The attribute importance is calibrated by the CCC derived from principal component analysis (PCA). The idea behind GUDA-CCC is that to stick closely to an original data set might be the best policy, especially when other available information is not reliable enough to be leveraged in the discretization. Experiments on benchmark data sets illustrate the effectiveness of the GUDA-CCC algorithm.