Better streaming algorithms for clustering problems

  • Authors:
  • Moses Charikar;Liadan O'Callaghan;Rina Panigrahy

  • Affiliations:
  • Princeton University;Stanford University;Cisco Systems

  • Venue:
  • Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k--Median problem which produces a constant factor approximation in one pass using storage space O(k poly log n). This is a significant improvement of the previous best algorithm which yielded a 2O(1/ε) approximation using O(nε) space. Next we give a streaming algorithm for the k--Median problem with an arbitrary distance function. We also study algorithms for clustering problems with outliers in the streaming model. Here, we give bicriterion guarantees, producing constant factor approximations by increasing the allowed fraction of outliers slightly.