Categorical Data Clustering Using the Combinations of Attribute Values

Authors:
Hee-Jung Do;Jae-Yearn Kim
Affiliations:
Department of Industrial Engineering, Hanyang University, Sungdong-gu, Seoul, Korea 133-791;Department of Industrial Engineering, Hanyang University, Sungdong-gu, Seoul, Korea 133-791
Venue:
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Year:
2008

Citing 12
Cited 1

CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques

Data mining: concepts and techniques
Reinterpreting the Category Utility Function

Machine Learning
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Clustering Algorithms and Validity Measures

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
Efficient Disk-Based K-Means Clustering for Relational Databases

IEEE Transactions on Knowledge and Data Engineering
Labeling Unclustered Categorical Data into Clusters Based on the Important Attribute Values

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Projected clustering for categorical datasets

Pattern Recognition Letters
Adherence clustering: an efficient method for mining market-basket clusters

Information Systems

DHCC: Divisive hierarchical clustering of categorical data

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is an important technique for exploratory data analysis. While most of the earlier clustering algorithms focused on numerical data, real-world problems and data mining applications frequently involve categorical data. Here, we propose a new clustering algorithm for categorical data that is based on the frequency of attribute value combinations. Our algorithm finds all the combinations of attribute values in a record, which represent a subset of all the attribute values, and then groups the records using the frequency of these combinations. As our algorithm considers all the subsets of attribute values in a record, records in a cluster have not only similar attribute value sets but also strongly associated attribute values. We evaluated our algorithm with real and synthetic data sets, and the experimental results demonstrate the effectiveness of our algorithm.