SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Cluster analysis and mathematical programming
Mathematical Programming: Series A and B - Special issue: papers from ismp97, the 16th international symposium on mathematical programming, Lausanne EPFL
ACM Computing Surveys (CSUR)
An Interior Point Algorithm for Minimum Sum-of-Squares Clustering
SIAM Journal on Scientific Computing
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Performance criteria for graph clustering and Markov cluster experiments
Performance criteria for graph clustering and Markov cluster experiments
Comparing clusterings---an information based distance
Journal of Multivariate Analysis
The Planar k-Means Problem is NP-Hard
WALCOM '09 Proceedings of the 3rd International Workshop on Algorithms and Computation
NP-hardness of Euclidean sum-of-squares clustering
Machine Learning
k-means requires exponentially many iterations even in the plane
Proceedings of the twenty-fifth annual symposium on Computational geometry
Improved heterogeneous distance functions
Journal of Artificial Intelligence Research
Data clustering: 50 years beyond K-means
Pattern Recognition Letters
Exact algorithms for minimum sum-of-squares clustering
Exact algorithms for minimum sum-of-squares clustering
Data Mining and Knowledge Discovery
Smoothed Analysis of the k-Means Method
Journal of the ACM (JACM)
Hi-index | 0.04 |
Clustering is the problem of dividing a dataset into subsets, called clusters, which are both homogeneous and well-separated. Many criteria have been devised which simultaneously measure both of these properties. Two such criteria are centroid-distance, used by the popular k-means algorithm, and the complete sum of all intra-cluster distances squared, which we call all-squares. This paper compares these two criteria in the context of clustering multisets which are defined over a metric space. We show that optimal clusterings according to both criteria can be consistent, meaning identical elements belong to the same cluster, but while centroid-distance always produces linearly separable solutions, all-squares does not. It has recently been shown that finding optimal clusterings according to centroid-distance in Euclidean space is NP-hard. We show that the decision problems associated with both optimisation problems are NP-complete in a simple, three-valued, metric space, and that the all-squares decision problem remains NP-complete in Euclidean space. We then show that if the metric is the simple 0/1 metric then both problems are in P. We then introduce a new metric on clusterings based on the earth mover's distance called the assignment metric and use this to show that optimal clusterings according to the two criteria can be as different as two clusterings can possibly be under both our metric and the well-known variation of information metric.