Optimal bounds for Johnson-Lindenstrauss transforms and streaming problems with sub-constant error

  • Authors:
  • T. S. Jayram;David Woodruff

  • Affiliations:
  • IBM Almaden;IBM Almaden

  • Venue:
  • Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Johnson-Lindenstrauss transform is a dimensionality reduction technique with a wide range of applications to theoretical computer science. It is specified by a distribution over projection matrices from Rn → Rk where k ≪ d and states that k = O(ε−2 log 1/δ) dimensions suffice to approximate the norm of any fixed vector in Rd to within a factor of 1 ± ε with probability at least 1 − δ. In this paper we show that this bound on k is optimal up to a constant factor, improving upon a previous Ω(ε−2 log 1/δ)/log(1/ε) dimension bound of Alon. Our techniques are based on lower bounding the information cost of a novel one-way communication game and yield the first space lower bounds in a data stream model that depend on the error probability δ. For many streaming problems, the most naïve way of achieving error probability δ is to first achieve constant probability, then take the median of O(log 1/δ) independent repetitions. Our techniques show that for a wide range of problems this is in fact optimal! As an example, we show that estimating the lp-distance for any p ∈ [0, 2] requires Ω(ε−2 log n log 1/δ) space, even for vectors in {0, 1}n. This is optimal in all parameters and closes a long line of work on this problem. We also show the number of distinct elements requires Ω(ε−2 log 1/δ + log n) space, which is optimal if ε−2 = Ω(log n). We also improve previous lower bounds for entropy in the strict turnstile and general turnstile models by a multiplicative factor of Ω(log 1/δ). Finally, we give an application to one-way communication complexity under product distributions, showing that unlike in the case of constant δ, the VC-dimension does not characterize the complexity when δ = o(1).