Confidence intervals for priority sampling

Authors:
Mikkel Thorup
Affiliations:
AT&T Labs---Research, Florham Park, NJ
Venue:
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Year:
2006

Citing 9
Cited 6

Randomized algorithms

Randomized algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms

The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Charging from sampled network usage

IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
On the relationship between file sizes, transport protocols, and self-similar network traffic

ICNP '96 Proceedings of the 1996 International Conference on Network Protocols (ICNP '96)
Flow sampling under hard resource constraints

Proceedings of the joint international conference on Measurement and modeling of computer systems
The DLT priority sampling is essentially optimal

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Learn more, sample less: control of volume and variance in network measurement

IEEE Transactions on Information Theory

Priority sampling for estimation of arbitrary subset sums

Journal of the ACM (JACM)
Confident estimation for multistage measurement sampling and aggregation

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Tighter estimation using bottom k sketches

Proceedings of the VLDB Endowment
Efficient Stream Sampling for Variance-Optimal Estimation of Subset Sums

SIAM Journal on Computing
Content placement via the exponential potential function method

IPCO'13 Proceedings of the 16th international conference on Integer Programming and Combinatorial Optimization
Bottom-k and priority sampling, set similarity and subset sums with minimal independence

Proceedings of the forty-fifth annual ACM symposium on Theory of computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With a priority sample from a set of weighted items, we can provide an unbiased estimate of the total weight of any subset. The strength of priority sampling is that it gives the best possible estimate variance on any set of input weights.For a concrete subset, however, the variance on the estimate of its weight depends strongly on the total set of weights and the distribution of the subset in this set. The variance is, for example, much smaller if weights are heavy tailed.In this paper we show how to generate a confidence interval directly from a priority sample, thus complementing the weight estimates with concrete lower and upper bounds. In particularly we will tell how heavy subsets can likely be hidden when the priority estimate for a subset is zero.Our confidence intervals for priority sampling are evaluated on real and synthetic data and compared with confidence intervals obtained with uniform sampling, weighted sampling with replacement, and threshold sampling.