Multi-query optimization for sketch-based estimation

Authors:
Alin Dobra;Minos Garofalakis;Johannes Gehrke;Rajeev Rastogi
Affiliations:
University of Florida, Gainesville, FL 32611, USA;Yahoo! Research, 2821 Mission College Blvd. Santa Clara, CA 95054, USA;4105B Upson Hall Cornell University Ithaca, NY 14853, USA;Bell Laboratories Rm 2B-301, 700 Mountain Avenue Murray Hill, NJ 07974, USA
Venue:
Information Systems
Year:
2009

Citing 24
Cited 4

Multiple-query optimization

ACM Transactions on Database Systems (TODS)
Resource allocation problems: algorithmic approaches

Resource allocation problems: algorithmic approaches
Introduction to algorithms

Introduction to algorithms
The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Tracking join and self-join sizes in limited storage

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient and extensible algorithms for multi query optimization

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
On computing correlated aggregates over continual data streams

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Estimating simple functions on the union of data streams

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Characterizing memory requirements for queries over continuous data streams

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports

Proceedings of the 27th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Counting Distinct Elements in a Data Stream

RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
An Approximate L1-Difference Algorithm for Massive Data Streams

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Stable distributions, pseudorandom generators, embeddings and data stream computation

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
How to summarize the universe: dynamic maintenance of quantiles

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Approximating sliding windows by cyclic tree-like histograms for efficient range queries

Data & Knowledge Engineering
Energy-efficient query management scheme for a wireless sensor database system

EURASIP Journal on Wireless Communications and Networking - Special issue on theoretical and algorithmic foundations of wireless ad hoc and sensor networks
Regression on evolving multi-relational data streams

Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Distributed similarity estimation using derived dimensions

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Randomized techniques, based on computing small ''sketch'' synopses for each stream, have recently been shown to be a very effective tool for approximating the result of a single SQL query over streaming data tuples. In this paper, we investigate the problems arising when data-stream sketches are used to process multiple such queries concurrently. We demonstrate that, in the presence of multiple query expressions, intelligently sharing sketches among concurrent query evaluations can result in substantial improvements in the utilization of the available sketching space and the quality of the resulting approximation error guarantees. We provide necessary and sufficient conditions for multi-query sketch sharing that guarantee the correctness of the result-estimation process. We also investigate the difficult optimization problem of determining sketch-sharing configurations that are optimal (e.g., under a certain error metric for a given amount of space). We prove that optimal sketch sharing typically gives rise to NP-hard questions, and we propose novel heuristic algorithms for finding good sketch-sharing configurations in practice. Results from our experimental study with queries from the TPC-H benchmark verify the effectiveness of our approach, clearly demonstrating the benefits of our sketch-sharing methodology.