Space-efficient estimation of statistics over sub-sampled streams

  • Authors:
  • Andrew McGregor;A. Pavan;Srikanta Tirthapura;David Woodruff

  • Affiliations:
  • University of Massachusetts, Amherst, MA, USA;Iowa State University, Ames, IA, USA;Iowa State University, Ames, IA, USA;IBM Almaden, Almaden, CA, USA

  • Venue:
  • PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many stream monitoring situations, the data arrival rate is so high that it is not even possible to observe each element of the stream. The most common solution is to sample a small fraction of the data stream and use the sample to infer properties and estimate aggregates of the original stream. However, the quantities that need to be computed on the sampled stream are often different from the original quantities of interest and their estimation requires new algorithms. We present upper and lower bounds (often matching) for estimating frequency moments, support size, entropy, and heavy hitters of the original stream from the data observed in the sampled stream.