Clustering for metric and nonmetric distance measures

Authors:
Marcel R. Ackermann;Johannes Blömer;Christian Sohler
Affiliations:
University of Paderborn, Paderborn, Germany;University of Paderborn, Paderborn, Germany;Technische Universität Dortmund, Dortmund, Germany
Venue:
ACM Transactions on Algorithms (TALG)
Year:
2010

Citing 28
Cited 2

The algebraic degree of geometric optimization problems

Discrete & Computational Geometry
Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract)

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Data clustering: a review

ACM Computing Surveys (CSUR)
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
A new greedy approach for facility location problems

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Parallel Optimization: Theory, Algorithms and Applications

Parallel Optimization: Theory, Algorithms and Applications
A Nearly Linear-Time Approximation Scheme for the Euclidean kappa-median Problem

ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
Bounded Geometries, Fractals, and Low-Distortion Embeddings

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Optimal Time Bounds for Approximate Clustering

Machine Learning
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Quick k-Median, k-Center, and Facility Location for Sparse Graphs

SIAM Journal on Computing
On k-Median clustering in high dimensions

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
The Effectiveness of Lloyd-Type Methods for the k-Means Problem

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Clustering with Bregman Divergences

The Journal of Machine Learning Research
A PTAS for k-means clustering based on weak coresets

SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Clustering for metric and non-metric distance measures

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Mixed Bregman Clustering with Approximation Guarantees

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Coresets and approximate clustering for Bregman divergences

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
On Coresets for $k$-Median and $k$-Means Clustering in Metric and Euclidean Spaces and Their Applications

SIAM Journal on Computing
Linear time algorithms for clustering problems in any dimensions

ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Smoothed Analysis of the k-Means Method

Journal of the ACM (JACM)
Deterministic sublinear-time approximations for metric 1-median selection

Information Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study a generalization of the k-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P of size n, our goal is to find a set C of size k such that the sum of errors D(P,C) = ∑p ∈ P minc ∈ C {D(p,c)} is minimized. The main result in this article can be stated as follows: There exists a (1+&epsis;)-approximation algorithm for the k-median problem with respect to D, if the 1-median problem can be approximated within a factor of (1+&epsis;) by taking a random sample of constant size and solving the 1-median problem on the sample exactly. This algorithm requires time n2O(mklog(mk/&epsis;)), where m is a constant that depends only on &epsis; and D. Using this characterization, we obtain the first linear time (1+&epsis;)-approximation algorithms for the k-median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences. Moreover, we obtain previously known results for the Euclidean k-median problem and the Euclidean k-means problem in a simplified manner. Our results are based on a new analysis of an algorithm of Kumar et al. [2004].