Polynomial-time approximation schemes for geometric min-sum median clustering

  • Authors:
  • Rafail Ostrovsky;Yuval Rabani

  • Affiliations:
  • Telcordia Technologies, Morristown, New Jersey;Technion---IIT, Haifa, Israel

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Johnson--Lindenstrauss lemma states that n points in ahigh-dimensional Hilbert space can be embedded with smalldistortion of the distances into an O(log n)dimensional space by applying a random linear transformation. Weshow that similar (though weaker) properties hold for certainrandom linear transformations over the Hamming cube. We use thesetransformations to solve NP-hard clustering problems in the cube aswell as in geometric settings.More specifically, we address thefollowing clustering problem. Given n points in a larger set(e.g., ℝd) endowed with a distance function (e.g.,L2 distance), we would like to partition the dataset into k disjoint clusters, each with a "cluster center,"so as to minimize the sum over all data points of the distancebetween the point and the center of the cluster containing thepoint. The problem is provably NP-hard in some high-dimensionalgeometric settings, even for k = 2. We give polynomial-timeapproximation schemes for this problem in several settings,including the binary cube {0,1}d with Hamming distance,and ℝd either with L1 distance,or with L2 distance, or with the square ofL2 distance. In all these settings, the bestprevious results were constant factor approximation guarantees.Wenote that our problem is similar in flavor to the k-medianproblem (and the related facility location problem), which has beenconsidered in graph-theoretic and fixed dimensional geometricsettings, where it becomes hard when k is part of the input.In contrast, we study the problem when k is fixed, but thedimension is part of the input.