A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time
Journal of the ACM (JACM)
Clustering with Bregman Divergences
The Journal of Machine Learning Research
Clustering for metric and non-metric distance measures
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Improved smoothed analysis of the k-means method
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Coresets and approximate clustering for Bregman divergences
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
k-means requires exponentially many iterations even in the plane
Proceedings of the twenty-fifth annual symposium on Computational geometry
Worst-Case and Smoothed Analysis of the ICP Algorithm, with an Application to the k-Means Method
SIAM Journal on Computing
k-Means Has Polynomial Smoothed Complexity
FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Smoothed Analysis of the k-Means Method
Journal of the ACM (JACM)
Bregman clustering for separable instances
SWAT'10 Proceedings of the 12th Scandinavian conference on Algorithm Theory
Approximate bregman near neighbors in sublinear time: beyond the triangle inequality
Proceedings of the twenty-eighth annual symposium on Computational geometry
Hi-index | 0.00 |
The k-means algorithm is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice. Most of the theoretical work is restricted to the case that squared Euclidean distances are used as similarity measure. In many applications, however, data is to be clustered with respect to other measures like, e.g., relative entropy, which is commonly used to cluster web pages. In this paper, we analyze the running-time of the k-means method for Bregman divergences, a very general class of similarity measures including squared Euclidean distances and relative entropy. We show that the exponential lower bound known for the Euclidean case carries over to almost every Bregman divergence. To narrow the gap between theory and practice, we also study k-means in the semi-random input model of smoothed analysis. For the case that n data points in 驴 d are perturbed by noise with standard deviation 驴, we show that for almost arbitrary Bregman divergences the expected running-time is bounded by ${\rm poly}(n^{\sqrt k}, 1/\sigma)$ and k kd ·poly(n, 1/驴).