Linear time algorithms for clustering problems in any dimensions

  • Authors:
  • Amit Kumar;Yogish Sabharwal;Sandeep Sen

  • Affiliations:
  • Dept of Comp Sc & Engg, Indian Institute of Technology, New Delhi, India;IBM India Research Lab, Block-I, IIT Delhi, Hauz Khas, New Delhi, India;Dept of Comp Sc & Engg, Indian Institute of Technology, Kharagpur, India

  • Venue:
  • ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We generalize the k-means algorithm presented by the authors [14] and show that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness). We prove these properties for the k-median and the discrete k-means clustering problems, resulting in O(2(k/ε)O(1)dn) time (1+ε)-approximation algorithms for these problems. These are the first algorithms for these problems linear in the size of the input (nd for n points in d dimensions), independent of dimensions in the exponent, assuming k and ε to be fixed. A key ingredient of the k-median result is a (1+ε)-approximation algorithm for the 1-median problem which has running time O(2(1/ε)O(1)d). The previous best known algorithm for this problem had linear running time.