CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques
Data mining: concepts and techniques
Reinterpreting the Category Utility Function
Machine Learning
COOLCAT: an entropy-based algorithm for categorical clustering
Proceedings of the eleventh international conference on Information and knowledge management
On Clustering Validation Techniques
Journal of Intelligent Information Systems
Clustering Categorical Data: An Approach Based on Dynamical Systems
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Clustering Algorithms and Validity Measures
SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
Efficient Disk-Based K-Means Clustering for Relational Databases
IEEE Transactions on Knowledge and Data Engineering
Labeling Unclustered Categorical Data into Clusters Based on the Important Attribute Values
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Projected clustering for categorical datasets
Pattern Recognition Letters
Adherence clustering: an efficient method for mining market-basket clusters
Information Systems
DHCC: Divisive hierarchical clustering of categorical data
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Clustering is an important technique for exploratory data analysis. While most of the earlier clustering algorithms focused on numerical data, real-world problems and data mining applications frequently involve categorical data. Here, we propose a new clustering algorithm for categorical data that is based on the frequency of attribute value combinations. Our algorithm finds all the combinations of attribute values in a record, which represent a subset of all the attribute values, and then groups the records using the frequency of these combinations. As our algorithm considers all the subsets of attribute values in a record, records in a cluster have not only similar attribute value sets but also strongly associated attribute values. We evaluated our algorithm with real and synthetic data sets, and the experimental results demonstrate the effectiveness of our algorithm.