Estimating sum by weighted sampling

Authors:
Rajeev Motwani;Rina Panigrahy;Ying Xu
Affiliations:
Dept of Computer Science, Stanford University;Microsoft Research, Mountain View, CA;Dept of Computer Science, Stanford University
Venue:
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
Year:
2007

Citing 13
Cited 2

Lower bounds for sampling algorithms for estimating the average

Information Processing Letters
Randomized algorithms

Randomized algorithms
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
Towards estimation error guarantees for distinct values

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On near-uniform URL sampling

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Sampling algorithms: lower bounds and applications

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
The indexable web is more than 11.5 billion pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Estimating arbitrary subset sums with few probes

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The DLT priority sampling is essentially optimal

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Random sampling from a search engine's index

Proceedings of the 15th international conference on World Wide Web
Estimating corpus size via queries

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Efficient search engine measurements

Proceedings of the 16th international conference on World Wide Web
Learn more, sample less: control of volume and variance in network measurement

IEEE Transactions on Information Theory

A sublinear-time approximation scheme for bin packing

Theoretical Computer Science
On estimating the average degree

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the classic problem of estimating the sum of n variables. The traditional uniform sampling approach requires a linear number of samples to provide any non-trivial guarantees on the estimated sum. In this paper we consider various sampling methods besides uniform sampling, in particular sampling a variable with probability proportional to its value, referred to as linear weighted sampling. If only linear weighted sampling is allowed, we show an algorithm for estimating sum with Õ(√n) samples, and it is almost optimal in the sense that Ω(√n) samples are necessary for any reasonable sum estimator. If both uniform sampling and linear weighted sampling are allowed, we show a sum estimator with Õ(3√n) samples. More generally, we may allow general weighted sampling where the probability of sampling a variable is proportional to any function of its value. We prove a lower bound of Ω(3√n) samples for any reasonable sum estimator using general weighted sampling, which implies that our algorithm combining uniform and linear weighted sampling is an almost optimal sum estimator.