Maintaining stream statistics over sliding windows: (extended abstract)

  • Authors:
  • Mayur Datar;Aristides Gionis;Piotr Indyk;Rajeev Motwani

  • Affiliations:
  • Stanford University, Stanford CA;Stanford University, Stanford CA;MIT Laboratory for Computer Science, Cambridge, Massachusetts;Stanford University, Stanford CA

  • Venue:
  • SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

We consider the problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far. We refer to this model as the sliding window model. We consider the following basic problem: Given a stream of bits, maintain a count of the number of 1's in the last N elements seen from the stream. We show that using O(1/e log2N) bits of memory, we can estimate the number of 1's to within a factor of 1 + ε. We also give a matching lower bound of Ω(1/e log2 N) memory bits for any deterministic or randomized algorithms. We extend our scheme to maintain the sum of the last N positive integers. We provide matching upper and lower bounds for this more general problem as well. We apply our techniques to obtain efficient algorithms for the Lp norms (for p ε [1, 2]) of vectors under the sliding window model. Using the algorithm for the basic counting problem, one can adapt many other techniques to work for the sliding window model, with a multiplicative overhead of O(1/εlog N) in memory and a 1 + ε factor loss in accuracy. These include maintaining approximate histograms, hash tables, and statistics or aggregates such as sum and averages.