Analysis and application of adaptive sampling

Authors:
James F. Lynch
Affiliations:
Department of Mathematics and Computer Science, Box 5815, Clarkson University, Potsdam, NY
Venue:
Journal of Computer and System Sciences - Special issue on PODS 2000
Year:
2003

Citing 5
Cited 1

A guided tour of Chernoff bounds

Information Processing Letters
Estimating the size of generalized transitive closures

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Queries are easier than you thought (probably)

PODS '92 Proceedings of the eleventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Query size estimation by adaptive sampling

Selected papers of the 9th annual ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Size-estimation framework with applications to transitive closure and reachability

Journal of Computer and System Sciences

A new method for adaptive sequential sampling for learning and parameter estimation

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

An estimation algorithm for a query is a probabilistic algorithm that computes an approximation for the size (number of tuples) of the query. One class cf estimation algorithms uses a form of statistical sampling known as adaptive sampling. Several versions of adaptive sampling have been developed by other researchers. The original version has been surpassed in some ways by a newer version and a more specialized Monte-Carlo algorithm. An analysis of the cost of the original version is presented, and the different algorithms are compared. The analysis is used to derive an upper bound on the number of samples required by the original algorithm. Also, contrary to what seems to be a commonly held opinion, none of the algorithms is generally better than the other two. Which algorithm is superior depends on the query being estimated and the criteria that are being applied. Another question that is studied is which classes of logically definable queries have fast estimation algorithms. Evidence from descriptive complexity theory is provided that indicates not all such queries have fast estimation algorithms. However, it is shown that on classes of structures of bounded degree, all first-order queries have fast estimation algorithms.