Sublinear-time approximation algorithms for clustering via random sampling

  • Authors:
  • Artur Czumaj;Christian Sohler

  • Affiliations:
  • Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom;Heinz Nixdorf Institute and Department of Computer Science, University of Paderborn, D-33102 Paderborn, Germany

  • Venue:
  • Random Structures & Algorithms - Proceedings from the 12th International Conference “Random Structures and Algorithms”, August1-5, 2005, Poznan, Poland
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel analysis of a random sampling approach for fourclustering problems in metric spaces: k-median,k-means, min-sum k-clustering, and balancedk-median. For all these problems, we consider the followingsimple sampling scheme: select a small sample set of input pointsuniformly at random and then run some approximation algorithm onthis sample set to compute an approximation of the best possibleclustering of this set. Our main technical contribution is asignificantly strengthened analysis of the approximation guaranteeby this scheme for the clustering problems.The main motivationbehind our analyses was to design sublinear-time algorithms forclustering problems. Our second contribution is the development ofnew approximation algorithms for the aforementioned clusteringproblems. Using our random sampling approach, we obtain for theseproblems the first time approximation algorithms that have runningtime independent of the input size, and depending on k andthe diameter of the metric space only. © 2006 WileyPeriodicals, Inc. Random Struct. Alg., 2006A preliminary extendedabstract of this work appeared in Proceedings of the 31st AnnualInternational Colloquium on Automata, Languages and Programming(ICALP), pp. 396-407, 2004.