Coresets and approximate clustering for Bregman divergences

  • Authors:
  • Marcel R. Ackermann;Johannes Blömer

  • Affiliations:
  • University of Paderborn, Germany;University of Paderborn, Germany

  • Venue:
  • SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the generalized k-median problem with respect to a Bregman divergence Dφ. Given a finite set P ⊆ Rd of size n, our goal is to find a set C of size k such that the sum of errors cost(P, C) = ΣpεP mincεC {DΦ(p, c)} is minimized. The Bregman k-median problem plays an important role in many applications, e.g. information theory, statistics, text classification, and speech processing. We give the first coreset construction for this problem for a large subclass of Bregman divergences, including important dissimilarity measures such as the Kullback-Leibler divergence and the Itakura-Saito divergence. Using these coresets, we give a (1 + ε)-approximation algorithm for the Bregman k-median problem with running time O (dkn + d22(k/c)thetas;(1) logk+2n). This result improves over the previousely fastest known (1 + ε)-approximation algorithm from [1]. Unlike the analysis of most coreset constructions our analysis does not rely on the construction of ε-nets. Instead, we prove our results by purely combinatorial means.