Tight bounds for distributed functional monitoring

  • Authors:
  • David P. Woodruff;Qin Zhang

  • Affiliations:
  • IBM Almaden, San Jose, CA, USA;MADALGO, Aarhus University, Aarhus, Denmark

  • Venue:
  • STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We resolve several fundamental questions in the area of distributed functional monitoring, initiated by Cormode, Muthukrishnan, and Yi (SODA, 2008), and receiving recent attention. In this model there are k sites each tracking their input streams and communicating with a central coordinator. The coordinator's task is to continuously maintain an approximate output to a function computed over the union of the k streams. The goal is to minimize the number of bits communicated. Let the p-th frequency moment be defined as Fp = ∑i fip, where fi is the frequency of element i. We show the randomized communication complexity of estimating the number of distinct elements (that is, F0) up to a 1+ε factor is Ω(k/ε2), improving upon the previous Ω(k + 1/ε2) bound and matching known upper bounds. For Fp, p 1, we improve the previous Ω(k + 1/ε2) communication bound to Ω(kp-1/ε2). We obtain similar improvements for heavy hitters, empirical entropy, and other problems. Our lower bounds are the first of any kind in distributed functional monitoring to depend on the product of k and 1/ε2. Moreover, the lower bounds are for the static version of the distributed functional monitoring model where the coordinator only needs to compute the function at the time when all k input streams end; surprisingly they almost match what is achievable in the (dynamic version of) distributed functional monitoring model where the coordinator needs to keep track of the function continuously at any time step. We also show that we can estimate Fp, for any p 1, using O(kp-1 poly(ε-1)) communication. This drastically improves upon the previous O(k2p+1N1-2/p poly(ε-1)) bound of Cormode, Muthukrishnan, and Yi for general p, and their O(k2/ε + k1.5/ε3) bound for p = 2. For p = 2, our bound resolves their main open question. Our lower bounds are based on new direct sum theorems for approximate majority, and yield improvements to classical problems in the standard data stream model. First, we improve the known lower bound for estimating Fp, p 2, in t passes from Ω(n1-2/p/(ε2/p t)) to Ω(n1-2/p/(ε4/p t)), giving the first bound that matches what we expect when p = 2 for any constant number of passes. Second, we give the first lower bound for estimating F0 in t passes with Ω(1/(ε2 t)) bits of space that does not use the hardness of the gap-hamming problem.