Maintaining stream statistics over multiscale sliding windows

  • Authors:
  • Yishan Jiao

  • Affiliations:
  • Institute of Computing Technology, Chinese Academy of Sciences and Graduate School, Chinese Academy of Sciences, Zhongguancun, Beijing, P.R.China

  • Venue:
  • ACM Transactions on Database Systems (TODS)
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article, we propose a new multiscale sliding window model which differentiates data items in different time periods of the data stream, based on a reasonable monotonicity of resolution assumption. Our model, as a well-motivated extension of the sliding window model, stands halfway between the traditional all-history and time-decaying models. We also present algorithms for estimating two significant data stream statistics---F0 and Jacard's similarity coefficient---with reasonable accuracies under the new model. Our algorithms use space logarithmic in the data stream size and linear in the number of windows; they support update time logarithmic in the number of windows and independent of the accuracy required. Our algorithms are easy to implement. Experimental results demonstrate the efficiencies of our algorithms. Our techniques apply to scenarios in which universe sampling is used.