Space lower bounds for distance approximation in the data stream model

  • Authors:
  • Michael Saks;Xiaodong Sun

  • Affiliations:
  • Rutgers University, New Brunswick, NJ;Rutgers University, New Brunswick, NJ

  • Venue:
  • STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

(MATH) We consider the problem of approximating the distance of two d-dimensional vectors x and y in the data stream model. In this model, the 2d coordinates are presented as a "stream" of data in some arbitrary order, where each data item includes the index and value of some coordinate and a bit that identifies the vector (x or y) to which it belongs. The goal is to minimize the amount of memory needed to approximate the distance. For the case of Lp-distance with p &egr; [1,2], there are good approximation algorithms that run in polylogarithmic space in d (here we assume that each coordinate is an integer with O(log d) bits). Here we prove that they do not exist for pρ2. In particular, we prove an optimal approximation-space tradeoff of approximating L&infty; distance of two vectors. We show that any randomized algorithm that approximates L&infty; distance of two length d vectors within factor of d&dgr; requires ω(d1—4&dgr;) space. As a consequence we show that for pρ2/(1—4&dgr;), any randomized algorithm that approximate Lp distance of two length d vectors within a factor d&dgr; requires ω(d 1— 2p—4&dgr;) space.The lower bound follows from a lower bound on the two-party one-round communication complexity of this problem. This lower bound is proved using a combination of information theory and Fourier analysis.