Coresets and approximate clustering for Bregman divergences

Authors:
Marcel R. Ackermann;Johannes Blömer
Affiliations:
University of Paderborn, Germany;University of Paderborn, Germany
Venue:
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Year:
2009

Citing 17
Cited 6

Decision theoretic generalizations of the PAC model for neural net and other learning applications

Information and Computation
Data clustering: a review

ACM Computing Surveys (CSUR)
Sublinear time approximate clustering

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Parallel Optimization: Theory, Algorithms and Applications

Parallel Optimization: Theory, Algorithms and Applications
A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Coresets in dynamic geometric data streams

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Smaller coresets for k-median and k-means clustering

SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
On k-Median clustering in high dimensions

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Clustering with Bregman Divergences

The Journal of Machine Learning Research
A PTAS for k-means clustering based on weak coresets

SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
On Bregman Voronoi diagrams

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Clustering for metric and non-metric distance measures

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Approximation algorithms for tensor clustering

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Clustering for metric and nonmetric distance measures

ACM Transactions on Algorithms (TALG)
Smoothed Analysis of the k-Means Method

Journal of the ACM (JACM)
Bregman clustering for separable instances

SWAT'10 Proceedings of the 12th Scandinavian conference on Algorithm Theory
Approximate bregman near neighbors in sublinear time: beyond the triangle inequality

Proceedings of the twenty-eighth annual symposium on Computational geometry

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the generalized k-median problem with respect to a Bregman divergence Dφ. Given a finite set P ⊆ Rd of size n, our goal is to find a set C of size k such that the sum of errors cost(P, C) = ΣpεP mincεC {DΦ(p, c)} is minimized. The Bregman k-median problem plays an important role in many applications, e.g. information theory, statistics, text classification, and speech processing. We give the first coreset construction for this problem for a large subclass of Bregman divergences, including important dissimilarity measures such as the Kullback-Leibler divergence and the Itakura-Saito divergence. Using these coresets, we give a (1 + ε)-approximation algorithm for the Bregman k-median problem with running time O (dkn + d22(k/c)thetas;(1) logk+2n). This result improves over the previousely fastest known (1 + ε)-approximation algorithm from [1]. Unlike the analysis of most coreset constructions our analysis does not rely on the construction of ε-nets. Instead, we prove our results by purely combinatorial means.