Clustering for metric and non-metric distance measures

Authors:
Marcel R. Ackermann;Johannes Blömer;Christian Sohler
Affiliations:
University of Paderborn, Paderborn, Germany;University of Paderborn, Paderborn, Germany;University of Paderborn, Paderborn, Germany
Venue:
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Year:
2008

Citing 22
Cited 13

The algebraic degree of geometric optimization problems

Discrete & Computational Geometry
Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract)

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Data clustering: a review

ACM Computing Surveys (CSUR)
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Parallel Optimization: Theory, Algorithms and Applications

Parallel Optimization: Theory, Algorithms and Applications
A Nearly Linear-Time Approximation Scheme for the Euclidean kappa-median Problem

ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
Bounded Geometries, Fractals, and Low-Distortion Embeddings

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP

Journal of the ACM (JACM)
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Optimal Time Bounds for Approximate Clustering

Machine Learning
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Quick k-Median, k-Center, and Facility Location for Sparse Graphs

SIAM Journal on Computing
On k-Median clustering in high dimensions

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Clustering with Bregman Divergences

The Journal of Machine Learning Research
A PTAS for k-means clustering based on weak coresets

SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Linear time algorithms for clustering problems in any dimensions

ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Mixed Bregman Clustering with Approximation Guarantees

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Coresets and approximate clustering for Bregman divergences

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Approximation algorithms for tensor clustering

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Clustering for metric and nonmetric distance measures

ACM Transactions on Algorithms (TALG)
Cluster editing problem for points on the real line: A polynomial time algorithm

Information Processing Letters
Proximity algorithms for nearly-doubling spaces

APPROX/RANDOM'10 Proceedings of the 13th international conference on Approximation, and 14 the International conference on Randomization, and combinatorial optimization: algorithms and techniques
Channel capacity restoration of noisy optical quantum channels

NEHIPISIC'11 Proceeding of 10th WSEAS international conference on electronics, hardware, wireless and optical communications, and 10th WSEAS international conference on signal processing, robotics and automation, and 3rd WSEAS international conference on nanotechnology, and 2nd WSEAS international conference on Plasma-fusion-nuclear physics
On nonmetric similarity search problems in complex domains

ACM Computing Surveys (CSUR)
Bregman clustering for separable instances

SWAT'10 Proceedings of the 12th Scandinavian conference on Algorithm Theory
TreeMatrix: A Hybrid Visualization of Compound Graphs

Computer Graphics Forum
Algorithmic superactivation of asymptotic quantum capacity of zero-capacity quantum channels

Information Sciences: an International Journal
Towards ontological similarity for spatial hierarchies

Proceedings of the Third ACM SIGSPATIAL International Workshop on Querying and Mining Uncertain Spatio-Temporal Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study a generalization of the k-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P, our goal is to find a set C of size k such that the sum of errors D(P, C) = Σp∈P minc∈C{D(p, c)} is minimized. The main result in this paper can be stated as follows: There exists an O(n2k/ε)O(1)) time (1 + ε)-approximation algorithm for the k-median problem with respect to D, if the 1-median problem can be approximated within a factor of (1 + ε) by taking a random sample of constant size and solving the 1-median problem on the sample exactly. Using this characterization, we obtain the first linear time (1 + ε)-approximation algorithms for the k-median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for Mahalanobis distances, and for some special cases of Bregman divergences. Moreover, we obtain previously known results for the Euclidean k-median problem and the Euclidean k-means problem in a simplified manner. Our results are based on a new analysis of an algorithm from [20].