Projected clustering for categorical datasets

Authors:
Minho Kim;R. S. Ramakrishna
Affiliations:
Bioinformatics Team, Electronics and Telecommunications Research Institute (ETRI), 161 Gajeong-dong, Yuseong-gu, Daejeon 305-350, Republic of Korea;Department of Information and Communications, Gwangju Institute of Science and Technology, 1 Oryong-dong, Buk-gu, Gwangju 500-712, Republic of Korea
Venue:
Pattern Recognition Letters
Year:
2006

Citing 14
Cited 3

A Validity Measure for Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering transactions using large items

Proceedings of the eighth international conference on Information and knowledge management
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Fuzzy Models and Algorithms for Pattern Recognition and Image Processing

Fuzzy Models and Algorithms for Pattern Recognition and Image Processing
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Redefining Clustering for High-Dimensional Applications

IEEE Transactions on Knowledge and Data Engineering
CLOPE: a fast and effective clustering algorithm for transactional data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering binary data streams with K-means

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Clustering and its validation in a symbolic framework

Pattern Recognition Letters
Efficient Disk-Based K-Means Clustering for Relational Databases

IEEE Transactions on Knowledge and Data Engineering
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence

Categorical Data Clustering Using the Combinations of Attribute Values

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Adjusting the clustering results referencing an external set

ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part II
A weighting k-modes algorithm for subspace clustering of categorical data

Neurocomputing

Quantified Score

Hi-index	0.10

Visualization

Abstract

This paper deals with the problem of clustering categorical datasets. Categorical data typically suffer from limited measuring levels and exhibit sparsity in a space of very high dimension. Conventional dissimilarity measures are, therefore, inadequate. We propose a new clustering algorithm based on projected clustering. The proposed algorithm, although hierarchical in essence, avoids the characteristic error propagation through reassignment and deletion of bad clusters. We also propose new indices for cluster validation in categorical datasets, an area that is almost unexplored. We present techniques for finding optimal number of clusters, and for initialization of centers of clusters. Experimental results demonstrate the effectiveness of the proposed clustering algorithm. The cluster validation for categorical datasets is also shown to be quite efficient.