A Validity Measure for Fuzzy Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering transactions using large items
Proceedings of the eighth international conference on Information and knowledge management
ROCK: a robust clustering algorithm for categorical attributes
Information Systems
Fuzzy Models and Algorithms for Pattern Recognition and Image Processing
Fuzzy Models and Algorithms for Pattern Recognition and Image Processing
COOLCAT: an entropy-based algorithm for categorical clustering
Proceedings of the eleventh international conference on Information and knowledge management
Redefining Clustering for High-Dimensional Applications
IEEE Transactions on Knowledge and Data Engineering
CLOPE: a fast and effective clustering algorithm for transactional data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering binary data streams with K-means
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Clustering and its validation in a symbolic framework
Pattern Recognition Letters
Efficient Disk-Based K-Means Clustering for Relational Databases
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Categorical Data Clustering Using the Combinations of Attribute Values
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Adjusting the clustering results referencing an external set
ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part II
Hi-index | 0.10 |
This paper deals with the problem of clustering categorical datasets. Categorical data typically suffer from limited measuring levels and exhibit sparsity in a space of very high dimension. Conventional dissimilarity measures are, therefore, inadequate. We propose a new clustering algorithm based on projected clustering. The proposed algorithm, although hierarchical in essence, avoids the characteristic error propagation through reassignment and deletion of bad clusters. We also propose new indices for cluster validation in categorical datasets, an area that is almost unexplored. We present techniques for finding optimal number of clusters, and for initialization of centers of clusters. Experimental results demonstrate the effectiveness of the proposed clustering algorithm. The cluster validation for categorical datasets is also shown to be quite efficient.