Bregman clustering for separable instances

  • Authors:
  • Marcel R. Ackermann;Johannes Blömer

  • Affiliations:
  • Department of Computer Science, University of Paderborn, Germany;Department of Computer Science, University of Paderborn, Germany

  • Venue:
  • SWAT'10 Proceedings of the 12th Scandinavian conference on Algorithm Theory
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Bregman k-median problem is defined as follows. Given a Bregman divergence Dφ and a finite set $P \subseteq {\mathbb R}^d$ of size n, our goal is to find a set C of size k such that the sum of errors cost(P,C)=∑p∈P min c∈C Dφ(p,c) is minimized. The Bregman k-median problem plays an important role in many applications, e.g., information theory, statistics, text classification, and speech processing. We study a generalization of the kmeans++ seeding of Arthur and Vassilvitskii (SODA '07). We prove for an almost arbitrary Bregman divergence that if the input set consists of k well separated clusters, then with probability $2^{-{\mathcal O}(k)}$ this seeding step alone finds an ${\mathcal O}(1)$-approximate solution. Thereby, we generalize an earlier result of Ostrovsky et al. (FOCS '06) from the case of the Euclidean k-means problem to the Bregman k-median problem. Additionally, this result leads to a constant factor approximation algorithm for the Bregman k-median problem using at most $2^{{\mathcal O}(k)}n$ arithmetic operations, including evaluations of Bregman divergence Dφ.