Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
A guide to simulation (2nd ed.)
A guide to simulation (2nd ed.)
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Generating beta variates with nonintegral shape parameters
Communications of the ACM
Limiting Result Cardinalities for Multidatabase Queries Using Histograms
BNCOD 18 Proceedings of the 18th British National Conference on Databases: Advances in Databases
Subspace clustering for high dimensional categorical data
ACM SIGKDD Explorations Newsletter
Sampling search-engine results
WWW '05 Proceedings of the 14th international conference on World Wide Web
Weighted random sampling with a reservoir
Information Processing Letters
Sequential reservoir sampling with a nonuniform distribution
ACM Transactions on Mathematical Software (TOMS)
Random Sampling for Continuous Streams with Arbitrary Updates
IEEE Transactions on Knowledge and Data Engineering
Sampling streaming data with replacement
Computational Statistics & Data Analysis
Efficient measurement of data flow enabling communication-aware parallelisation
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
A component model of spatial locality
Proceedings of the 2009 international symposium on Memory management
Optimal sampling from sliding windows
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Virtual reuse distance analysis of SPECjvm2008 data locality
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Weighted random sampling with a reservoir
Information Processing Letters
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Optimal sampling from sliding windows
Journal of Computer and System Sciences
Discovery of locality-improving refactorings by reuse path analysis
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Weighted k-means for density-biased clustering
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Hi-index | 0.01 |
One-pass algorithms for sampling n records without replacement from a population of unknown size n are known as reservoir-sampling algorithms. In this article, Vitter's reservoir-sampling algorithm, algorithm Z, is modified to give a more efficient algorithm, algorithm K. Additionally, two new algorithms, algorithm L and algorithm M, are proposed. If the time for scanning the population is ignored, all the four algorithms have expected CPU time O(n(1 + log(N/n))), which is optimum up to a constant factor. Expressions of the expected CPU time for the algorithms are presented. Among the four, algorithm L is the simplest, and algorithm M is the most efficient when n and N/n are large and N is O(n2).