On k-Median clustering in high dimensions

  • Authors:
  • Ke Chen

  • Affiliations:
  • University of Illinois, Urbana

  • Venue:
  • SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study approximation algorithms for k-median clustering. We obtain small coresets for k-median clustering in metric spaces as well as in Euclidean spaces. Specifically, in Rd, those coresets are of size with only polynomial dependency on d. This leads to a (1 + ε)-approximation algorithm for k-median clustering in Rd, with running time O(ndk +2(k/ε)o(1)d2nσ), for any σ 0. This is an improvement over previous results [5, 20, 21]. We also provide fast constant factor approximation algorithms for k-median clustering in finite metric spaces.We use those coresets to compute (1 + ε)-approximation k-median clustering in the streaming model of computation, using only O(k2de-2log8 n) space, where the points are taken from Rd. This is the first streaming algorithm, for this problem, that has space complexity with only polynomial dependency on the dimension.