Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n)))

  • Authors:
  • Kim-Hung Li

  • Affiliations:
  • Chinese Univ. of Hong Kong, Shatin, N.T., Hong Kong

  • Venue:
  • ACM Transactions on Mathematical Software (TOMS)
  • Year:
  • 1994

Quantified Score

Hi-index 0.01

Visualization

Abstract

One-pass algorithms for sampling n records without replacement from a population of unknown size n are known as reservoir-sampling algorithms. In this article, Vitter's reservoir-sampling algorithm, algorithm Z, is modified to give a more efficient algorithm, algorithm K. Additionally, two new algorithms, algorithm L and algorithm M, are proposed. If the time for scanning the population is ignored, all the four algorithms have expected CPU time O(n(1 + log(N/n))), which is optimum up to a constant factor. Expressions of the expected CPU time for the algorithms are presented. Among the four, algorithm L is the simplest, and algorithm M is the most efficient when n and N/n are large and N is O(n2).