Sampling algorithms: lower bounds and applications

  • Authors:
  • Ziv Bar-Yossef;Ravi Kumar;D. Sivakumar

  • Affiliations:
  • Computer Science Division, U.C. Berkeley, Berkeley, CA;IBM Almaden, 650 Harry Road, San Jose, CA;IBM Almaden, 650 Harry Road, San Jose, CA

  • Venue:
  • STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We develop a framework to study probabilistic sampling algorithms that approximate general functions of the form \genfunc, where \domain and \range are arbitrary sets. Our goal is to obtain lower bounds on the query complexity of functions, namely the number of input variables x_i that any sampling algorithm needs to query to approximate f(x_1,\ldots,x_n).We define two quantitative properties of functions --- the it block sensitivity and the minimum Hellinger distance --- that give us techniques to prove lower bounds on the query complexity. These techniques are quite general, easy to use, yet powerful enough to yield tight results. Our applications include the mean and higher statistical moments, the median and other selection functions, and the frequency moments, where we obtain lower bounds that are close to the corresponding upper bounds.We also point out some connections between sampling and streaming algorithms and lossy compression schemes.