An efficient algorithm for sequential random sampling

Authors:
Jeffrey Scott Vitter
Affiliations:
Brown Univ., Providence, RI
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
1987

Citing 5
Cited 18

Faster methods for random sampling

Communications of the ACM
Sequential random sampling

ACM Transactions on Mathematical Software (TOMS)
Computer methods for sampling from the exponential and normal distributions

Communications of the ACM
A note on sampling a tape-file

Communications of the ACM
The Art of Computer Programming Volumes 1-3 Boxed Set

The Art of Computer Programming Volumes 1-3 Boxed Set

Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
An Improved Algorithm for Ordered Sequential Random Sampling

ACM Transactions on Mathematical Software (TOMS)
Mining long sequential patterns in a noisy environment

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Clustering High Dimensional Massive Scientific Datasets

Journal of Intelligent Information Systems
Limiting Result Cardinalities for Multidatabase Queries Using Histograms

BNCOD 18 Proceedings of the 18th British National Conference on Databases: Advances in Databases
Online maintenance of very large random samples

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Multi-scaling sampling: an adaptive sampling method for discovering approximate association rules

Journal of Computer Science and Technology
Quality-Aware Sampling and Its Applications in Incremental Data Mining

IEEE Transactions on Knowledge and Data Engineering
Maintaining very large random samples using the geometric file

The VLDB Journal — The International Journal on Very Large Data Bases
Online maintenance of very large random samples on flash storage

Proceedings of the VLDB Endowment
Feature-preserved sampling over streaming data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Online maintenance of very large random samples on flash storage

The VLDB Journal — The International Journal on Very Large Data Bases
A new approach for generating efficient sample from market basket data

Expert Systems with Applications: An International Journal
Sampling ensembles for frequent patterns

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
Differentially private projected histograms: construction and use for prediction

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Efficient processing of top-k join queries by attribute domain refinement

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
In-situ sampling of a large-scale particle simulation for interactive visualization and analysis

EuroVis'11 Proceedings of the 13th Eurographics / IEEE - VGTC conference on Visualization
Taming massive distributed datasets: data sampling using bitmap indices

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We examine several methods for drawing a sequential random sample of n records from a file containing N records. Method D is recommended for general use. The algorithm is on-line (so that CPU time can be overlapped with I/O), has a small constant memory requirement, and is easy to program. An improved implementation is detailed in the Appendix.