Performance guarantees for hierarchical clustering

  • Authors:
  • Sanjoy Dasgupta;Philip M. Long

  • Affiliations:
  • Department of Computer Science and Engineering, University of California, San Diego, USA;Genome Institute of Singapore

  • Venue:
  • Journal of Computer and System Sciences - Special issue on COLT 2002
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We show that for any data set in any metric space, it is possible to construct a hierarchical clustering with the guarantee that for every k, the induced k-clustering has cost at most eight times that of the optimal k-clustering. Here the cost of a clustering is taken to be the maximum radius of its clusters. Our algorithm is similar in simplicity and efficiency to popular agglomerative heuristics for hierarchical clustering, and we show that these heuristics have unbounded approximation factors.