Algorithms for clustering data
Algorithms for clustering data
Statistical Pattern Recognition: A Review
IEEE Transactions on Pattern Analysis and Machine Intelligence
ACM Computing Surveys (CSUR)
Data mining: concepts and techniques
Data mining: concepts and techniques
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Data Mining: An Overview from a Database Perspective
IEEE Transactions on Knowledge and Data Engineering
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Fuzzy clustering of categorical data using fuzzy centroids
Pattern Recognition Letters
Rough Set-Based Clustering with Refinement Using Shannon's Entropy Theory
Computers & Mathematics with Applications
On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm
IEEE Transactions on Pattern Analysis and Machine Intelligence
On Data Labeling for Clustering Categorical Data
IEEE Transactions on Knowledge and Data Engineering
A new measure of uncertainty based on knowledge granulation for rough sets
Information Sciences: an International Journal
An initialization method for the K-Means algorithm using neighborhood model
Computers & Mathematics with Applications
A fuzzy k-modes algorithm for clustering categorical data
IEEE Transactions on Fuzzy Systems
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Expert Systems with Applications: An International Journal
Hi-index | 12.06 |
As the size of data growing at a rapid pace, clustering a very large data set inevitably incurs a time-consuming process. To improve the efficiency of clustering, sampling is usually used to scale down the size of data set. However, with sampling applied, how to allocate unlabeled objects into proper clusters is a very difficult problem. In this paper, based on the frequency of attribute values in a given cluster and the distributions of attribute values in different clusters, a novel similarity measure is proposed to allocate each unlabeled object into the corresponding appropriate cluster for clustering categorical data. Furthermore, a labeling algorithm for categorical data is presented, and its corresponding time complexity is analyzed as well. The effectiveness of the proposed algorithm is shown by the experiments on real-world data sets.