Sublinear-time approximation algorithms for clustering via random sampling

  • Authors:
  • Artur Czumaj;Christian Sohler

  • Affiliations:
  • Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom;Heinz Nixdorf Institute and Department of Computer Science, University of Paderborn, D-33102 Paderborn, Germany

  • Venue:
  • Random Structures & Algorithms - Proceedings from the 12th International Conference “Random Structures and Algorithms”, August1-5, 2005, Poznan, Poland
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel analysis of a random sampling approach for fourclustering problems in metric spaces: k-median, k-means, min-sumk-clustering, and balanced k-median. For all theseproblems, we consider the following simple sampling scheme: selecta small sample set of input points uniformly at random and then runsome approximation algorithm on this sample set to compute anapproximation of the best possible clustering of this set. Our maintechnical contribution is a significantly strengthened analysis ofthe approximation guarantee by this scheme for the clusteringproblems.The main motivation behind our analyses was to designsublinear-time algorithms for clustering problems. Our secondcontribution is the development of new approximation algorithms forthe aforementioned clustering problems. Using our random samplingapproach, we obtain for these problems the first time approximationalgorithms that have running time independent of the input size,and depending on k and the diameter of the metric spaceonly. © 2006 Wiley Periodicals, Inc. Random Struct. Alg.,2007A preliminary extended abstract of this work appeared inProceedings of the 31st Annual International Colloquium onAutomata, Languages and Programming (ICALP), pp. 396407, 2004.