On coresets for k-means and k-median clustering

  • Authors:
  • Sariel Har-Peled;Soham Mazumdar

  • Affiliations:
  • University of Illinois, Urbana-Champaign, Urbana, IL;University of Illinois, Urbana-Champaign, Urbana, IL

  • Venue:
  • STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in Rd, one can compute a weighted set S ⊆ P, of size O(k ε-d log n), such that one can compute the k-median/means clustering on S instead of on P, and get an (1+ε)-approximation. As a result, we improve the fastest known algorithms for (1+ε)-approximate k-means and k-median. Our algorithms have linear running time for a fixed k and ε. In addition, we can maintain the (1+ε)-approximate k-median or k-means clustering of a stream when points are being only inserted, using polylogarithmic space and update time.