Sampling from a moving window over streaming data

  • Authors:
  • Brian Babcock;Mayur Datar;Rajeev Motwani

  • Affiliations:
  • Stanford University, CA;Stanford University, CA;Stanford University, CA

  • Venue:
  • SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce the problem of sampling from a moving window of recent items from a data stream and develop two algorithms for this problem. The first algorithm, "chain-sample", extends reservoir sampling to deal with the expiration of data elements from the sample. The expected memory usage of our algorithm is O(k) when maintaining a sample of size k over a window of the n most recent elements from the data stream, and with high probability the algorithm requires no more than O(k log n) memory.When the number of elements in the window is variable, as is the case when the size of the window is defined as a time duration rather than as a fixed number of data elements, the sampling problem becomes harder. Our second algorithm, "priority-sample", works even when the number of elements in the window can vary dynamically over time. With high probability, the "priority-sample" algorithm uses no more than O(k log n) memory.