Optimal algorithms for approximate clustering

  • Authors:
  • Tomás Feder;Daniel Greene

  • Affiliations:
  • Computer Science Department, Stanford University, Stanford, CA;Xerox Palo Alto Research Center, Palo Alto, CA

  • Venue:
  • STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
  • Year:
  • 1988

Quantified Score

Hi-index 0.01

Visualization

Abstract

In a clustering problem, the aim is to partition a given set of n points in d-dimensional space into k groups, called clusters, so that points within each cluster are near each other. Two objective functions frequently used to measure the performance of a clustering algorithm are, for any L4 metric, (a) the maximum distance between pairs of points in the same cluster, and (b) the maximum distance between points in each cluster and a chosen cluster center; we refer to either measure as the cluster size.We show that one cannot approximate the optimal cluster size for a fixed number of clusters within a factor close to 2 in polynomial time, for two or more dimensions, unless P=NP. We also present an algorithm that achieves this factor of 2 in time &Ogr;(n log k), and show that this running time is optimal in the algebraic decision tree model. For a fixed cluster size, on the other hand, we give a polynomial time approximation scheme that estimates the optimal number of clusters under the second measure of cluster size within factors arbitrarily close to 1. Our approach is extended to provide approximation algorithms for the restricted centers, suppliers, and weighted suppliers problems that run in optimal &Ogr;(n log k) time and achieve optimal or nearly optimal approximation bounds.