A simple, yet effective and efficient, sliding window sampling algorithm

Authors:
Xuesong Lu;Wee Hyong Tok;Chedy Raissi;Stéphane Bressan
Affiliations:
School of Computing, National University of Singapore;Microsoft Corporation;School of Computing, National University of Singapore;School of Computing, National University of Singapore
Venue:
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Year:
2010

Citing 9
Cited 0

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Sampling from a moving window over streaming data

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Dynamic sample selection for approximate query processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Approximate Aggregation Techniques for Sensor Databases

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
A Sampling-Based Approach to Optimizing Top-k Queries in Sensor Networks

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A dip in the reservoir: maintaining sample synopses of evolving datasets

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
On biased reservoir sampling in the presence of stream evolution

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Sampling for Sequential Pattern Mining: From Static Databases to Data Streams

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sampling streams of continuous data with limited memory, or reservoir sampling, is a utility algorithm. Standard reservoir sampling maintains a random sample of the entire stream as it has arrived so far. This restriction does not meet the requirement of many applications that need to give preference to recent data. The simplest algorithm for maintaining a random sample of a sliding window reproduces periodically the same sample design. This is also undesirable for many applications. Other existing algorithms are using variable size memory, variable size samples or maintain biased samples and allow expired data in the sample. We propose an effective algorithm, which is very simple and therefore efficient, for maintaining a near random fixed size sample of a sliding window. Indeed our algorithm maintains a biased sample that may contain expired data. Yet it is a good approximation of a random sample with expired data being present with low probability. We analytically explain why and under which parameter settings the algorithm is effective. We empirically evaluate its performance (effectiveness) and compare it with the performance of existing representatives of random sampling over sliding windows and biased sampling algorithm.