Analysis and application of adaptive sampling

Authors:
James F. Lynch
Affiliations:
Department of Mathematics and Computer Science, Box 5815, Clarkson University, Potsdam, NY
Venue:
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2000

Citing 5
Cited 6

A guided tour of Chernoff bounds

Information Processing Letters
Estimating the size of generalized transitive closures

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Queries are easier than you thought (probably)

PODS '92 Proceedings of the eleventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Query size estimation by adaptive sampling

Selected papers of the 9th annual ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Size-estimation framework with applications to transitive closure and reachability

Journal of Computer and System Sciences

Sequential Sampling Algorithms: Unified Analysis and Lower Bounds

SAGA '01 Proceedings of the International Symposium on Stochastic Algorithms: Foundations and Applications
How Can Computer Science Contribute to Knowledge Discovery?

SOFSEM '01 Proceedings of the 28th Conference on Current Trends in Theory and Practice of Informatics Piestany: Theory and Practice of Informatics
Sequential Sampling Techniques for Algorithmic Learning Theory

ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
Containment join size estimation: models and methods

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Venn Sampling: A Novel Prediction Technique for Moving Objects

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Sequential sampling techniques for algorithmic learning theory

Theoretical Computer Science - Algorithmic learning theory (ALT 2000)

Quantified Score

Hi-index	0.00

Visualization

Abstract

An estimation algorithm for a query is a probabilistic algorithm that computes an approximation for the size (number of tuples) of the query. The main question that is studied is which classes of logically definable queries have fast estimation algorithms. Evidence from descriptive complexity theory is provided that indicates not all such queries have fast estimation algorithms. However, it is shown that on classes of structures of bounded degree, all first-order queries have fast estimation algorithms.These estimation algorithms use a form of statistical sampling known as adaptive sampling. Several versions of adaptive sampling have been developed by other researchers. The original version has been surpassed in some ways by a newer version and a more specialized Monte-Carlo algorithm. An analysis of the average run time of the original version is given, and the different algorithms are compared. The analysis is used to compute what appears to be the best known upper bound on the efficiency of the original algorithm. Also, contrary to what seems to be a commonly held opinion, the two methods of adaptive sampling are incomparable. Which method is superior depends on the query being estimated and the criteria that are being applied. Lastly, adaptive sampling can be more efficient than the Monte-Carlo algorithm if knowledge about the maximum values of the data being sampled is available.