Flocks, herds and schools: A distributed behavioral model
SIGGRAPH '87 Proceedings of the 14th annual conference on Computer graphics and interactive techniques
Discretization: An Enabling Technique
Data Mining and Knowledge Discovery
IEEE Transactions on Knowledge and Data Engineering
Toward Unsupervised Correlation Preserving Discretization
IEEE Transactions on Knowledge and Data Engineering
A Fuzzy Approach to Partitioning Continuous Attributes for Classification
IEEE Transactions on Knowledge and Data Engineering
Knowledge Acquisition Based on Rough Set Theory and Principal Component Analysis
IEEE Intelligent Systems
A discretization algorithm based on Class-Attribute Contingency Coefficient
Information Sciences: an International Journal
A Non-parametric Semi-supervised Discretization Method
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
On biases in estimating multi-valued attributes
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
Data discretization is an important task for certain types of data mining algorithms such as association rule discovery and Bayesian learning. For those algorithms, proper discretization not only can significantly improve the quality and understandability of discovered knowledge, but also can reduce the running time. We present a Global Unsupervised Discretization Algorithm based on Collective Correlation Coefficient (GUDA-CCC) that provides the following attractive merits. 1) It does not require class labels from training data. 2) It preserves the ranks of attribute importance in a data set and meanwhile minimizes the information loss measured by mean square error. The attribute importance is calibrated by the CCC derived from principal component analysis (PCA). The idea behind GUDA-CCC is that to stick closely to an original data set might be the best policy, especially when other available information is not reliable enough to be leveraged in the discretization. Experiments on benchmark data sets illustrate the effectiveness of the GUDA-CCC algorithm.