Query size estimation by adaptive sampling (extended abstract)

Authors:
Richard J. Lipton;Jeffrey F. Naughton
Affiliations:
Department of Computer Science, Princeton University;Department of Computer Sciences, University of Wisconsin
Venue:
PODS '90 Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Year:
1990

Citing 12
Cited 34

Design overview of the NAIL] system

Proceedings on Third international conference on logic programming
Bounds on the propagation of selection into logic programs

PODS '87 Proceedings of the sixth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Estimating the size of generalized transitive closures

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Argument reduction by factoring

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Statistical estimators for relational algebra expressions

Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Estimating block transfers and join sizes

SIGMOD '83 Proceedings of the 1983 ACM SIGMOD international conference on Management of data
Top-down statistical estimation on a database

SIGMOD '83 Proceedings of the 1983 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Simple Random Sampling from Relational Databases

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
LDL: A Logic-Based Data Language

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases

Practical selectivity estimation through adaptive sampling

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Optimizing equijoin queries in distributed databases where relations are hash partitioned

ACM Transactions on Database Systems (TODS)
Error-constrained COUNT query evaluation in relational databases

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Sequential sampling procedures for query size estimation

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Learning efficient query processing strategies

PODS '92 Proceedings of the eleventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A supplement to sampling-based methods for query size estimation in a database system

ACM SIGMOD Record
Fixed-precision estimation of join selectivity

PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On the relative cost of sampling for join selectivity estimation

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Bifocal sampling for skew-resistant join size estimation

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Substring selectivity estimation

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Adaptive real-time transactions and risk-based load control

CIKM '96 Proceedings of the workshop on Databases: active and real-time
SchemaSQL: An extension to SQL for multidatabase interoperability

ACM Transactions on Database Systems (TODS)
A Hybrid Estimator for Selectivity Estimation

IEEE Transactions on Knowledge and Data Engineering
Estimating Answer Sizes for XML Queries

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Multi-Dimensional Substring Selectivity Estimation

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Random Sampling from Pseudo-Ranked B+ Trees

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Toward Practical Constraint Databases

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
One-dimensional and multi-dimensional substring selectivity estimation

The VLDB Journal — The International Journal on Very Large Data Bases
Data reduction: sampling

Handbook of data mining and knowledge discovery
Containment join size estimation: models and methods

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An integrated method for estimating selectivities in a multidatabase system

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
The Sort-Merge-Shrink join

ACM Transactions on Database Systems (TODS)
Compressed histograms with arbitrary bucket layouts for selectivity estimation

Information Sciences: an International Journal
Estimating nested selectivity in object-oriented and object-relational databases

Information and Software Technology
Adaptive-sampling algorithms for answering aggregation queries on Web sites

Data & Knowledge Engineering
Query evaluation and optimization in the semantic web

Theory and Practice of Logic Programming
Sampling-based estimators for subset-based queries

The VLDB Journal — The International Journal on Very Large Data Bases
HASE: a hybrid approach to selectivity estimation for conjunctive predicates

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
A metropolis sampling method for drawing representative samples from large databases

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Efficiently joining group patterns in SPARQL queries

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part I
Data Quality of Query Results with Generalized Selection Conditions

Operations Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an adaptive, random sampling algorithm for estimating the size of general queries. The algorithm can be used for any query Q over a database D such that 1) for some n, the answer to Q can be partitioned into n disjoint subsets Q1, Q2, …, Qn, and 2) for 1 ≤ i ≤ n, the size of Qi is bounded by some function b(D, Q), and 3) there is some algorithm by which we can compute the size of Qi, where i is chosen randomly. We consider the performance of the algorithm on three special cases of the algorithm: join queries, transitive closure queries, and general recursive Datalog queries.