Tight bounds for distributed functional monitoring

Authors:
David P. Woodruff;Qin Zhang
Affiliations:
IBM Almaden, San Jose, CA, USA;MADALGO, Aarhus University, Aarhus, Denmark
Venue:
STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Year:
2012

Citing 40
Cited 2

Elements of information theory

Elements of information theory
The probabilistic communication complexity of set intersection

SIAM Journal on Discrete Mathematics
The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Communication complexity

Communication complexity
Lower bounds on the multiparty communication complexity

Journal of Computer and System Sciences
Next century challenges: scalable coordination in sensor networks

MobiCom '99 Proceedings of the 5th annual ACM/IEEE international conference on Mobile computing and networking
On the Distributional Complexity of Disjontness

ICALP '90 Proceedings of the 17th International Colloquium on Automata, Languages and Programming
Distributed top-k monitoring

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Informational Complexity and the Direct Sum Problem for Simultaneous Message Complexity

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Tight Lower Bounds for the Distinct Elements Problem

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Optimal space lower bounds for all frequency moments

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
An information statistics approach to data stream and communication complexity

Journal of Computer and System Sciences - Special issue on FOCS 2002
Finding (Recently) Frequent Items in Distributed Data Streams

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Optimal approximations of the frequency moments of data streams

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Holistic aggregates in a networked world: distributed tracking of approximate quantiles

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Sketching streams through the net: distributed approximate query tracking

VLDB '05 Proceedings of the 31st international conference on Very large data bases
The complexity of massive data set computations

The complexity of massive data set computations
Communication-efficient distributed monitoring of thresholded counts

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Lower Bounds on Streaming Algorithms for Approximating the Length of the Longest Increasing Subsequence

FOCS '07 Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
Tight lower bounds for selection in randomly ordered streams

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
On distance to monotonicity and longest increasing subsequence of a data stream

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Robust lower bounds for communication and stream computation

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Probabilistic computations: Toward a unified measure of complexity

SFCS '77 Proceedings of the 18th Annual Symposium on Foundations of Computer Science
Sketching and Streaming Entropy via Approximation Theory

FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
Optimal tracking of distributed heavy hitters and quantiles

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Functional Monitoring without Monotonicity

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Revisiting the Direct Sum Theorem and Space Lower Bounds in Random Order Streams

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
A Multi-Round Communication Lower Bound for Gap Hamming and Some Consequences

CCC '09 Proceedings of the 2009 24th Annual IEEE Conference on Computational Complexity
Hellinger Strikes Back: A Note on the Multi-party Information Complexity of AND

APPROX '09 / RANDOM '09 Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques
How to compress interactive communication

Proceedings of the forty-second ACM symposium on Theory of computing
Recognizing well-parenthesized expressions in the streaming model

Proceedings of the forty-second ACM symposium on Theory of computing
An optimal algorithm for the distinct elements problem

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimal sampling from distributed streams

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Better gap-hamming lower bounds via better round elimination

APPROX/RANDOM'10 Proceedings of the 13th international conference on Approximation, and 14 the International conference on Randomization, and combinatorial optimization: algorithms and techniques
Information Cost Tradeoffs for Augmented Index and Streaming Language Recognition

FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Algorithms for distributed functional monitoring

ACM Transactions on Algorithms (TALG)
An optimal lower bound on the communication complexity of gap-hamming-distance

Proceedings of the forty-third annual ACM symposium on Theory of computing
Fast moment estimation in data streams in optimal space

Proceedings of the forty-third annual ACM symposium on Theory of computing
Optimal random sampling from distributed streams revisited

DISC'11 Proceedings of the 25th international conference on Distributed computing
Lower bounds for number-in-hand multiparty communication complexity, made easy

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms

Randomized algorithms for tracking distributed count, frequencies, and ranks

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
The continuous distributed monitoring model

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

We resolve several fundamental questions in the area of distributed functional monitoring, initiated by Cormode, Muthukrishnan, and Yi (SODA, 2008), and receiving recent attention. In this model there are k sites each tracking their input streams and communicating with a central coordinator. The coordinator's task is to continuously maintain an approximate output to a function computed over the union of the k streams. The goal is to minimize the number of bits communicated. Let the p-th frequency moment be defined as Fp = ∑i fip, where fi is the frequency of element i. We show the randomized communication complexity of estimating the number of distinct elements (that is, F0) up to a 1+ε factor is Ω(k/ε2), improving upon the previous Ω(k + 1/ε2) bound and matching known upper bounds. For Fp, p 1, we improve the previous Ω(k + 1/ε2) communication bound to Ω(kp-1/ε2). We obtain similar improvements for heavy hitters, empirical entropy, and other problems. Our lower bounds are the first of any kind in distributed functional monitoring to depend on the product of k and 1/ε2. Moreover, the lower bounds are for the static version of the distributed functional monitoring model where the coordinator only needs to compute the function at the time when all k input streams end; surprisingly they almost match what is achievable in the (dynamic version of) distributed functional monitoring model where the coordinator needs to keep track of the function continuously at any time step. We also show that we can estimate Fp, for any p 1, using O(kp-1 poly(ε-1)) communication. This drastically improves upon the previous O(k2p+1N1-2/p poly(ε-1)) bound of Cormode, Muthukrishnan, and Yi for general p, and their O(k2/ε + k1.5/ε3) bound for p = 2. For p = 2, our bound resolves their main open question. Our lower bounds are based on new direct sum theorems for approximate majority, and yield improvements to classical problems in the standard data stream model. First, we improve the known lower bound for estimating Fp, p 2, in t passes from Ω(n1-2/p/(ε2/p t)) to Ω(n1-2/p/(ε4/p t)), giving the first bound that matches what we expect when p = 2 for any constant number of passes. Second, we give the first lower bound for estimating F0 in t passes with Ω(1/(ε2 t)) bits of space that does not use the hardness of the gap-hamming problem.