Analysis and application of adaptive sampling

  • Authors:
  • James F. Lynch

  • Affiliations:
  • Department of Mathematics and Computer Science, Box 5815, Clarkson University, Potsdam, NY

  • Venue:
  • PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

An estimation algorithm for a query is a probabilistic algorithm that computes an approximation for the size (number of tuples) of the query. The main question that is studied is which classes of logically definable queries have fast estimation algorithms. Evidence from descriptive complexity theory is provided that indicates not all such queries have fast estimation algorithms. However, it is shown that on classes of structures of bounded degree, all first-order queries have fast estimation algorithms.These estimation algorithms use a form of statistical sampling known as adaptive sampling. Several versions of adaptive sampling have been developed by other researchers. The original version has been surpassed in some ways by a newer version and a more specialized Monte-Carlo algorithm. An analysis of the average run time of the original version is given, and the different algorithms are compared. The analysis is used to compute what appears to be the best known upper bound on the efficiency of the original algorithm. Also, contrary to what seems to be a commonly held opinion, the two methods of adaptive sampling are incomparable. Which method is superior depends on the query being estimated and the criteria that are being applied. Lastly, adaptive sampling can be more efficient than the Monte-Carlo algorithm if knowledge about the maximum values of the data being sampled is available.