Randomized Synopses for Query Assurance on Data Streams

  • Authors:
  • Ke Yi;Feifei Li;Marios Hadjieleftheriou;George Kollios;Divesh Srivastava

  • Affiliations:
  • Department of Computer Science and Engineering, HKUST, Clear Water Bay, Kowloon, Hong Kong. yike@cse.ust.hk;Department of Computer Science, Florida State University, Tallahassee, FL 32306, USA. lifeifei@cs.fsu.edu;AT&TLabs-Research, Florham Park, NJ 07932, USA. marioh@research.att.com;Department of Computer Science, Boston University, Boston, MA 02215, USA. gkollios@cs.bu.edu;AT&TLabs-Research, Florham Park, NJ 07932, USA. divesh@research.att.com

  • Venue:
  • ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The overwhelming flow of information in many data stream applications forces many companies to outsource to a third-party the deployment of a Data Stream Management System (DSMS) for performing desired computations. Remote computations intrinsically raise issues of trust, making query execution assurance on data streams a problem with practical implications. Consider a client observing the same data stream as a remote server (e.g., network traffic), that registers a continuous query on the server's DSMS, and receives answers upon request. The client needs to verify the integrity of the results using significantly fewer resources than evaluating the query locally. Towards that goal, we propose a probabilistic algorithm for selection and aggregate/group-by queries, that uses constant space irrespective of the result-set size, has low update cost, and arbitrarily small probability of failure. We generalize this algorithm to allow some tolerance on the number of errors permitted (irrespective of error magnitude), and also discuss the hardness of permitting arbitrary errors of small magnitude. We also perform an empirical evaluation using live network traffic.