A PTAS for k-means clustering based on weak coresets

  • Authors:
  • Dan Feldman;Morteza Monemizadeh;Christian Sohler

  • Affiliations:
  • Tel Aviv University, Tel Aviv, Israel;University of Paderborn, Paderborn, Germany;Heinz Nixdorf Institute and Department of Computer Science, Paderborn, Germany

  • Venue:
  • SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a point set P ⊆ Rd the k-means clustering problem is to find a set C=(c1,...,ck) of k points and a partition of P into k clusters C1,...,Ck such that the sum of squared errors ∑i=1k ∑p ∈ Ci |p -ci |22 is minimized. For given centers this cost function is minimized byassigning points to the nearest center.The k-means cost function is probably the most widely used cost function in the area of clustering.In this paper we show that every unweighted point set P has a weak (ε, k)-coreset of size Poly(k,1/ε) for the k-means clustering problem, i.e. its size is independent of the cardinality |P| of the point set and the dimension d of the Euclidean space Rd. A weak coreset is a weighted set S ⊆ P together with a set T such that T contains a (1+ε)-approximation for the optimal cluster centers from P and for every set of kcenters from T the cost of the centers for S is a (1±ε)-approximation of the cost for P.We apply our weak coreset to obtain a PTAS for the k-means clustering problem with running time O(nkd + d · Poly(k/ε) + 2Õ(k/ε)).