Approximations and optimal geometric divide-and-conquer
STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
On linear-time deterministic algorithms for optimization problems in fixed dimension
Journal of Algorithms
Approximate medians and other quantiles in one pass and with limited memory
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Improved bounds on the sample complexity of learning
Journal of Computer and System Sciences
The discrepancy method: randomness and complexity
The discrepancy method: randomness and complexity
An Approximate L1-Difference Algorithm for Massive Data Streams
SIAM Journal on Computing
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Medians and beyond: new aggregation techniques for sensor networks
SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Finding (Recently) Frequent Items in Distributed Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Power-conserving computation of order-statistics over sensor networks
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
TAG: a Tiny AGgregation service for Ad-Hoc sensor networks
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Tributaries and deltas: efficient and robust aggregation in sensor network streams
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Stable distributions, pseudorandom generators, embeddings, and data stream computation
Journal of the ACM (JACM)
An integrated efficient solution for computing frequent and top-k elements in data streams
ACM Transactions on Database Systems (TODS)
Range Counting over Multidimensional Data Streams
Discrete & Computational Geometry
How to summarize the universe: dynamic maintenance of quantiles
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
On distributing symmetric streaming computations
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Algorithms for ε-Approximations of Terrains
ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part I
Finding frequent items in data streams
Proceedings of the VLDB Endowment
Tight results for clustering and summarizing data streams
Proceedings of the 12th International Conference on Database Theory
Space-optimal heavy hitters with strong error bounds
ACM Transactions on Database Systems (TODS)
Constructive Algorithms for Discrepancy Minimization
FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Sampling based algorithms for quantile computation in sensor networks
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Fast moment estimation in data streams in optimal space
Proceedings of the forty-third annual ACM symposium on Theory of computing
On Range Searching in the Group Model and Combinatorial Discrepancy
FOCS '11 Proceedings of the 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science
Analyzing graph structure via linear measurements
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Semidefinite optimization in discrepancy theory
Mathematical Programming: Series A and B - Special Issue on ISMP 2012
Constructive Discrepancy Minimization by Walking on the Edges
FOCS '12 Proceedings of the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science
Hi-index | 0.00 |
We study the mergeability of data summaries. Informally speaking, mergeability requires that, given two summaries on two datasets, there is a way to merge the two summaries into a single summary on the two datasets combined together, while preserving the error and size guarantees. This property means that the summaries can be merged in a way akin to other algebraic operators such as sum and max, which is especially useful for computing summaries on massive distributed data. Several data summaries are trivially mergeable by construction, most notably all the sketches that are linear functions of the datasets. But some other fundamental ones, like those for heavy hitters and quantiles, are not (known to be) mergeable. In this article, we demonstrate that these summaries are indeed mergeable or can be made mergeable after appropriate modifications. Specifically, we show that for ϵ-approximate heavy hitters, there is a deterministic mergeable summary of size O(1/ϵ); for ϵ-approximate quantiles, there is a deterministic summary of size O((1/ϵ) log(ϵ n)) that has a restricted form of mergeability, and a randomized one of size O((1/ϵ) log3/2(1/ϵ)) with full mergeability. We also extend our results to geometric summaries such as ϵ-approximations which permit approximate multidimensional range counting queries. While most of the results in this article are theoretical in nature, some of the algorithms are actually very simple and even perform better than the previously best known algorithms, which we demonstrate through experiments in a simulated sensor network. We also achieve two results of independent interest: (1) we provide the best known randomized streaming bound for ϵ-approximate quantiles that depends only on ϵ, of size O((1/ϵ) log3/2(1/ϵ)), and (2) we demonstrate that the MG and the SpaceSaving summaries for heavy hitters are isomorphic.