Randomized algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Charging from sampled network usage
IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
On the relationship between file sizes, transport protocols, and self-similar network traffic
ICNP '96 Proceedings of the 1996 International Conference on Network Protocols (ICNP '96)
Flow sampling under hard resource constraints
Proceedings of the joint international conference on Measurement and modeling of computer systems
The DLT priority sampling is essentially optimal
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Learn more, sample less: control of volume and variance in network measurement
IEEE Transactions on Information Theory
Priority sampling for estimation of arbitrary subset sums
Journal of the ACM (JACM)
Confident estimation for multistage measurement sampling and aggregation
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Tighter estimation using bottom k sketches
Proceedings of the VLDB Endowment
Efficient Stream Sampling for Variance-Optimal Estimation of Subset Sums
SIAM Journal on Computing
Content placement via the exponential potential function method
IPCO'13 Proceedings of the 16th international conference on Integer Programming and Combinatorial Optimization
Bottom-k and priority sampling, set similarity and subset sums with minimal independence
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Hi-index | 0.00 |
With a priority sample from a set of weighted items, we can provide an unbiased estimate of the total weight of any subset. The strength of priority sampling is that it gives the best possible estimate variance on any set of input weights.For a concrete subset, however, the variance on the estimate of its weight depends strongly on the total set of weights and the distribution of the subset in this set. The variance is, for example, much smaller if weights are heavy tailed.In this paper we show how to generate a confidence interval directly from a priority sample, thus complementing the weight estimates with concrete lower and upper bounds. In particularly we will tell how heavy subsets can likely be hidden when the priority estimate for a subset is zero.Our confidence intervals for priority sampling are evaluated on real and synthetic data and compared with confidence intervals obtained with uniform sampling, weighted sampling with replacement, and threshold sampling.