Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Local Search Heuristics for k-Median and Facility Location Problems
SIAM Journal on Computing
Optimal Time Bounds for Approximate Clustering
Machine Learning
A k-Median Algorithm with Running Time Independent of Data Size
Machine Learning
Improving k-modes algorithm considering frequencies of attribute values in mode
CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
A genetic k-modes algorithm for clustering categorical data
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
A fuzzy k-modes algorithm for clustering categorical data
IEEE Transactions on Fuzzy Systems
Hi-index | 0.02 |
In this paper, we study clustering with respect to the k-modes objective function, a natural formulation of clustering for categorical data. One of the main contributions of this paper is to establish the connection between k- modes and k-median, i.e., the optimum of k-median is at most the twice the optimum of k-modes for the same categorical data clustering problem. Based on this observation, we derive a deterministic algorithm that achieves an approximation factor of 2. Furthermore, we prove that the distance measure in k-modes defines a metric. Hence, we are able to extend existing approximation algorithms for metric k-median to k-modes. Empirical results verify the superiority of our method.