Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Size-estimation framework with applications to transitive closure and reachability
Journal of Computer and System Sciences
Estimating flow distributions from sampled flow statistics
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Estimating arbitrary subset sums with few probes
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Learn more, sample less: control of volume and variance in network measurement
IEEE Transactions on Information Theory
Confidence intervals for priority sampling
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Summarizing data using bottom-k sketches
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Priority sampling for estimation of arbitrary subset sums
Journal of the ACM (JACM)
Confident estimation for multistage measurement sampling and aggregation
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Tighter estimation using bottom k sketches
Proceedings of the VLDB Endowment
Stream sampling for variance-optimal estimation of subset sums
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Leveraging discarded samples for tighter estimation of multiple-set aggregates
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Optimal sampling from sliding windows
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Distinct-value synopses for multiset operations
Communications of the ACM - A View of Parallel Computing
Coordinated weighted sampling for estimating aggregates over multiple weight assignments
Proceedings of the VLDB Endowment
On the variance of subset sum estimation
ESA'07 Proceedings of the 15th annual European conference on Algorithms
Get the most out of your sample: optimal unbiased estimators using partial information
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Detecting adversarial advertisements in the wild
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimal sampling from sliding windows
Journal of Computer and System Sciences
Efficient Stream Sampling for Variance-Optimal Estimation of Subset Sums
SIAM Journal on Computing
Estimating sum by weighted sampling
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
Content placement via the exponential potential function method
IPCO'13 Proceedings of the 16th international conference on Integer Programming and Combinatorial Optimization
Bottom-k and priority sampling, set similarity and subset sums with minimal independence
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Hi-index | 0.00 |
The priority sampling procedure of N. Duffield, C. Lund and M. Thorup is not only an exciting new approach to sampling weighted data streams, but it has also proven to be highly successful in a variety of practical applications. We resolve the two major issues related to its performance. First we solve the main conjecture of N. Alon, N. Duffield, C. Lund and M. Thorup in [1], which states that the standard deviation for the subset sum estimator obtained from k priority samples is upper bounded by W/√k-1, where W denotes the actual subset sum that the estimator estimates. Although Alon et al. give an O(W/√k-1) upper bound on the standard deviation of the estimator, their formula cannot be used as a performance guarantee in an applied setting, because the constants coming up in their proof are very large. Our constant cannot be improved. We also resolve the conjecture of Duffield, C. Lund and M. Thorup which states that the variance of the priority sampling procedure is not larger than the variance of the threshold sampling procedure with sample size only one smaller. This is the main conjecture in [7]. The conjecture's significance is that the latter procedure is provably optimal within a very general class of sampling algorithms, but unlike priority sampling, it is not practical. Our result therefore certifies that priority sampling offers the unlikely feat of uniting mathematical elegance, (essential) optimality and applicability. Our proof is based on a new integral formula and on very finely tuned telescopic sums.