Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n)))

Authors:
Kim-Hung Li
Affiliations:
Chinese Univ. of Hong Kong, Shatin, N.T., Hong Kong
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
1994

Citing 4
Cited 18

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
A guide to simulation (2nd ed.)

A guide to simulation (2nd ed.)
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms

The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Generating beta variates with nonintegral shape parameters

Communications of the ACM

Limiting Result Cardinalities for Multidatabase Queries Using Histograms

BNCOD 18 Proceedings of the 18th British National Conference on Databases: Advances in Databases
Subspace clustering for high dimensional categorical data

ACM SIGKDD Explorations Newsletter
Sampling search-engine results

WWW '05 Proceedings of the 14th international conference on World Wide Web
Weighted random sampling with a reservoir

Information Processing Letters
Sequential reservoir sampling with a nonuniform distribution

ACM Transactions on Mathematical Software (TOMS)
Random Sampling for Continuous Streams with Arbitrary Updates

IEEE Transactions on Knowledge and Data Engineering
Sampling streaming data with replacement

Computational Statistics & Data Analysis
Efficient measurement of data flow enabling communication-aware parallelisation

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
A component model of spatial locality

Proceedings of the 2009 international symposium on Memory management
Optimal sampling from sliding windows

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Virtual reuse distance analysis of SPECjvm2008 data locality

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Weighted random sampling with a reservoir

Information Processing Letters
A comparison between approximate counting and sampling methods for frequent pattern mining on data streams

Intelligent Data Analysis
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II
Optimal sampling from sliding windows

Journal of Computer and System Sciences
Discovery of locality-improving refactorings by reuse path analysis

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Weighted k-means for density-biased clustering

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery

Quantified Score

Hi-index	0.01

Visualization

Abstract

One-pass algorithms for sampling n records without replacement from a population of unknown size n are known as reservoir-sampling algorithms. In this article, Vitter's reservoir-sampling algorithm, algorithm Z, is modified to give a more efficient algorithm, algorithm K. Additionally, two new algorithms, algorithm L and algorithm M, are proposed. If the time for scanning the population is ignored, all the four algorithms have expected CPU time O(n(1 + log(N/n))), which is optimum up to a constant factor. Expressions of the expected CPU time for the algorithms are presented. Among the four, algorithm L is the simplest, and algorithm M is the most efficient when n and N/n are large and N is O(n2).