Adaptive-Size Reservoir Sampling over Data Streams

  • Authors:
  • Mohammed Al-Kateb;Byung Suk Lee;X. Sean Wang

  • Affiliations:
  • The University of Vermont, USA;The University of Vermont, USA;The University of Vermont, USA

  • Venue:
  • SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reservoir sampling is a well-known technique for sequential random sampling over data streams. Conventional reservoir sampling assumes a fixed-size reservoir. There are situations, however, in which it is necessary and/or advantageous to adaptively adjust the size of a reservoir in the middle of sampling due to changes in data characteristics and/or application behavior. This paper studies adaptivesize reservoir sampling over data streams considering two main factors: reservoir size and sample uniformity. First, the paper conducts a theoretical study on the effects of adjusting the size of a reservoir while sampling is in progress. The theoretical results show that such an adjustment may bring a negative impact on the probability of the sample being uniform (called uniformity confidence herein). Second, the paper presents a novel algorithm for maintaining the reservoir sample after the reservoir size is adjusted such that the resulting uniformity confidence exceeds a given threshold. Third, the paper extends the proposed algorithm to an adaptive multi-reservoir sampling algorithm for a practical application in which samples are collected from memory-limited wireless sensor networks using a mobile sink. Finally, the paper empirically examines the adaptivity of the multi-reservoir sampling algorithm with regard to reservoir size and sample uniformity using real sensor networks data sets.