Lower bounds for sampling algorithms for estimating the average
Information Processing Letters
Randomized algorithms
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Towards estimation error guarantees for distinct values
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Sampling algorithms: lower bounds and applications
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
The indexable web is more than 11.5 billion pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Estimating arbitrary subset sums with few probes
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The DLT priority sampling is essentially optimal
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Random sampling from a search engine's index
Proceedings of the 15th international conference on World Wide Web
Estimating corpus size via queries
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Efficient search engine measurements
Proceedings of the 16th international conference on World Wide Web
Learn more, sample less: control of volume and variance in network measurement
IEEE Transactions on Information Theory
A sublinear-time approximation scheme for bin packing
Theoretical Computer Science
On estimating the average degree
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
We study the classic problem of estimating the sum of n variables. The traditional uniform sampling approach requires a linear number of samples to provide any non-trivial guarantees on the estimated sum. In this paper we consider various sampling methods besides uniform sampling, in particular sampling a variable with probability proportional to its value, referred to as linear weighted sampling. If only linear weighted sampling is allowed, we show an algorithm for estimating sum with Õ(√n) samples, and it is almost optimal in the sense that Ω(√n) samples are necessary for any reasonable sum estimator. If both uniform sampling and linear weighted sampling are allowed, we show a sum estimator with Õ(3√n) samples. More generally, we may allow general weighted sampling where the probability of sampling a variable is proportional to any function of its value. We prove a lower bound of Ω(3√n) samples for any reasonable sum estimator using general weighted sampling, which implies that our algorithm combining uniform and linear weighted sampling is an almost optimal sum estimator.