Approximations and optimal geometric divide-and-conquer
STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
On linear-time deterministic algorithms for optimization problems in fixed dimension
Journal of Algorithms
Approximate medians and other quantiles in one pass and with limited memory
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Improved bounds on the sample complexity of learning
Journal of Computer and System Sciences
The discrepancy method: randomness and complexity
The discrepancy method: randomness and complexity
An Approximate L1-Difference Algorithm for Massive Data Streams
SIAM Journal on Computing
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Practical methods for shape fitting and kinetic data structures using core sets
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Approximating extent measures of points
Journal of the ACM (JACM)
Medians and beyond: new aggregation techniques for sensor networks
SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Finding (Recently) Frequent Items in Distributed Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Power-conserving computation of order-statistics over sensor networks
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
TAG: a Tiny AGgregation service for Ad-Hoc sensor networks
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Tributaries and deltas: efficient and robust aggregation in sensor network streams
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Graph distances in the streaming model: the value of space
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
An integrated efficient solution for computing frequent and top-k elements in data streams
ACM Transactions on Database Systems (TODS)
Range Counting over Multidimensional Data Streams
Discrete & Computational Geometry
A space-optimal data-stream algorithm for coresets in the plane
SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
How to summarize the universe: dynamic maintenance of quantiles
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
On distributing symmetric streaming computations
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Proceedings of the twenty-fourth annual symposium on Computational geometry
Algorithms for ε-Approximations of Terrains
ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part I
An Almost Space-Optimal Streaming Algorithm for Coresets in Fixed Dimensions
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Finding frequent items in data streams
Proceedings of the VLDB Endowment
Tight results for clustering and summarizing data streams
Proceedings of the 12th International Conference on Database Theory
Faster core-set constructions and data-stream algorithms in fixed dimensions
Computational Geometry: Theory and Applications
An optimal algorithm for the distinct elements problem
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast Manhattan sketches in data streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space-optimal heavy hitters with strong error bounds
ACM Transactions on Database Systems (TODS)
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Constructive Algorithms for Discrepancy Minimization
FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Tight hardness results for minimizing discrepancy
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
CR-PRECIS: a deterministic summary structure for update data streams
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Randomized algorithms for tracking distributed count, frequencies, and ranks
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Quantiles over data streams: an experimental study
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Optimus: a dynamic rewriting framework for data-parallel execution plans
Proceedings of the 8th ACM European Conference on Computer Systems
Homomorphic fingerprints under misalignments: sketching edit and shift distances
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
ACM Transactions on Database Systems (TODS) - Invited papers issue
Indexing for summary queries: Theory and practice
ACM Transactions on Database Systems (TODS)
Aggregation and degradation in JetStream: streaming analytics in the wide area
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
We study the mergeability of data summaries. Informally speaking, mergeability requires that, given two summaries on two data sets, there is a way to merge the two summaries into a single summary on the union of the two data sets, while preserving the error and size guarantees. This property means that the summaries can be merged in a way like other algebraic operators such as sum and max, which is especially useful for computing summaries on massive distributed data. Several data summaries are trivially mergeable by construction, most notably all the sketches that are linear functions of the data sets. But some other fundamental ones like those for heavy hitters and quantiles, are not (known to be) mergeable. In this paper, we demonstrate that these summaries are indeed mergeable or can be made mergeable after appropriate modifications. Specifically, we show that for ε-approximate heavy hitters, there is a deterministic mergeable summary of size O(1/ε) for ε-approximate quantiles, there is a deterministic summary of size O(1 over ε log(εn))that has a restricted form of mergeability, and a randomized one of size O(1 over ε log 3/21 over ε) with full mergeability. We also extend our results to geometric summaries such as ε-approximations and εkernels. We also achieve two results of independent interest: (1) we provide the best known randomized streaming bound for ε-approximate quantiles that depends only on ε, of size O(1 over ε log 3/21 over ε, and (2) we demonstrate that the MG and the SpaceSaving summaries for heavy hitters are isomorphic.