Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Estimating simple functions on the union of data streams
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Finding (Recently) Frequent Items in Distributed Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Holistic aggregates in a networked world: distributed tracking of approximate quantiles
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
The complexity of massive data set computations
The complexity of massive data set computations
Communication-efficient distributed monitoring of thresholded counts
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
An integrated efficient solution for computing frequent and top-k elements in data streams
ACM Transactions on Database Systems (TODS)
Range Counting over Multidimensional Data Streams
Discrete & Computational Geometry
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient and private distance approximation in the communication and streaming models
Efficient and private distance approximation in the communication and streaming models
Probabilistic computations: Toward a unified measure of complexity
SFCS '77 Proceedings of the 18th Annual Symposium on Foundations of Computer Science
Finding frequent items in data streams
Proceedings of the VLDB Endowment
Optimal tracking of distributed heavy hitters and quantiles
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Functional Monitoring without Monotonicity
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Algorithms for distributed functional monitoring
ACM Transactions on Algorithms (TALG)
Sampling based algorithms for quantile computation in sensor networks
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Optimal random sampling from distributed streams revisited
DISC'11 Proceedings of the 25th international conference on Distributed computing
Continuous sampling from distributed streams
Journal of the ACM (JACM)
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Tight bounds for distributed functional monitoring
STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Hi-index | 0.00 |
We show that randomization can lead to significant improvements for a few fundamental problems in distributed tracking. Our basis is the count-tracking problem, where there are k players, each holding a counter ni that gets incremented over time, and the goal is to track an ∑-approximation of their sum n=∑ini continuously at all times, using minimum communication. While the deterministic communication complexity of the problem is θ(k/ε • log N), where N is the final value of n when the tracking finishes, we show that with randomization, the communication cost can be reduced to θ(√k/ε • log N). Our algorithm is simple and uses only O(1) space at each player, while the lower bound holds even assuming each player has infinite computing power. Then, we extend our techniques to two related distributed tracking problems: frequency-tracking and rank-tracking, and obtain similar improvements over previous deterministic algorithms. Both problems are of central importance in large data monitoring and analysis, and have been extensively studied in the literature.