Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n)))
ACM Transactions on Mathematical Software (TOMS)
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Large data series: modeling the usual to identify the unusual
Computational Statistics & Data Analysis
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
M-Kernel Merging: Towards Density Estimation over Data Streams
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Sequential reservoir sampling with a nonuniform distribution
ACM Transactions on Mathematical Software (TOMS)
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Editorial: Special Issue on Statistical Algorithms and Software
Computational Statistics & Data Analysis
Editorial: Second special issue on statistical algorithms and software
Computational Statistics & Data Analysis
Hi-index | 0.03 |
Simple random sampling is a widely accepted basis for estimation from a population. When data come as a stream, the total population size continuously grows and only one pass through the data is possible. Reservoir sampling is a method of maintaining a fixed size random sample from streaming data. Reservoir sampling without replacement has been extensively studied and several algorithms with sub-linear time complexity exist. Although reservoir sampling with replacement is previously mentioned by some authors, it has been studied very little and only linear algorithms exist. A with-replacement reservoir sampling algorithm of sub-linear time complexity is introduced. A thorough complexity analysis of several approaches to the with-replacement reservoir sampling problem is also provided.