Randomized Synopses for Query Assurance on Data Streams

Authors:
Ke Yi;Feifei Li;Marios Hadjieleftheriou;George Kollios;Divesh Srivastava
Affiliations:
Department of Computer Science and Engineering, HKUST, Clear Water Bay, Kowloon, Hong Kong. yike@cse.ust.hk;Department of Computer Science, Florida State University, Tallahassee, FL 32306, USA. lifeifei@cs.fsu.edu;AT&TLabs-Research, Florham Park, NJ 07932, USA. marioh@research.att.com;Department of Computer Science, Boston University, Boston, MA 02215, USA. gkollios@cs.bu.edu;AT&TLabs-Research, Florham Park, NJ 07932, USA. divesh@research.att.com
Venue:
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Year:
2008

Citing 0
Cited 5

Small synopses for group-by query verification on outsourced data streams

ACM Transactions on Database Systems (TODS)
Annotations in Data Streams

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Continuous authentication on relational streams

The VLDB Journal — The International Journal on Very Large Data Bases
Best-order streaming model

Theoretical Computer Science
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

The overwhelming flow of information in many data stream applications forces many companies to outsource to a third-party the deployment of a Data Stream Management System (DSMS) for performing desired computations. Remote computations intrinsically raise issues of trust, making query execution assurance on data streams a problem with practical implications. Consider a client observing the same data stream as a remote server (e.g., network traffic), that registers a continuous query on the server's DSMS, and receives answers upon request. The client needs to verify the integrity of the results using significantly fewer resources than evaluating the query locally. Towards that goal, we propose a probabilistic algorithm for selection and aggregate/group-by queries, that uses constant space irrespective of the result-set size, has low update cost, and arbitrarily small probability of failure. We generalize this algorithm to allow some tolerance on the number of errors permitted (irrespective of error magnitude), and also discuss the hardness of permitting arbitrary errors of small magnitude. We also perform an empirical evaluation using live network traffic.