Practical selectivity estimation through adaptive sampling
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Error-constrained COUNT query evaluation in relational databases
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Sequential sampling procedures for query size estimation
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Fixed-precision estimation of join selectivity
PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On the relative cost of sampling for join selectivity estimation
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Query size estimation by adaptive sampling
Selected papers of the 9th annual ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Bifocal sampling for skew-resistant join size estimation
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Selectivity and cost estimation for joins based on random sampling
Journal of Computer and System Sciences
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Statistical estimators for relational algebra expressions
Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
STHoles: a multidimensional workload-aware histogram
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Applying the golden rule of sampling for query estimation
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Improved bounds on the sample complexity of learning
Journal of Computer and System Sciences
The discrepancy method: randomness and complexity
The discrepancy method: randomness and complexity
Database Systems: The Complete Book
Database Systems: The Complete Book
Lectures on Discrete Geometry
Selectivity Estimation in the Presence of Alphanumeric Correlations
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Selectivity Estimation Using Homogeneity Measurement
Proceedings of the Sixth International Conference on Data Engineering
Sampling-Based Selectivity Estimation for Joins Using Augmented Frequent Value Statistics
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
ICICLES: Self-Tuning Samples for Approximate Query Answering
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Query Size Estimation Using Machine Learning
Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Dynamic sample selection for approximate query processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A multi-dimensional histogram for selectivity estimation and fast approximate query answering
CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
Query Size Estimation for Joins Using Systematic Sampling
Distributed and Parallel Databases
Conditional selectivity for statistics on query expressions
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Histograms revisited: when are histograms the best approximation method for aggregates over joins?
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Synopses for query optimization: A space-complexity perspective
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Techniques for Warehousing of Sample Data
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
New Sampling-Based Estimators for OLAP Queries
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
End-biased Samples for Join Cardinality Estimation
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
ISOMER: Consistent Histogram Construction Using Query Feedback
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A dip in the reservoir: maintaining sample synopses of evolving datasets
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Consistent selectivity estimation via maximum entropy
The VLDB Journal — The International Journal on Very Large Data Bases
Optimized stratified sampling for approximate query processing
ACM Transactions on Database Systems (TODS)
Cardinality estimation using sample views with quality assurance
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Robust Stratified Sampling Plans for Low Selectivity Queries
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Understanding cardinality estimation using entropy maximization
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Relative (p,ε)-Approximations in Geometry
Discrete & Computational Geometry
Hi-index | 0.00 |
We develop a novel method, based on the statistical concept of VC-dimension, for evaluating the selectivity (output cardinality) of SQL queries - a crucial step in optimizing the execution of large scale database and data-mining operations. The major theoretical contribution of this work, which is of independent interest, is an explicit bound on the VC-dimension of a range space defined by all possible outcomes of a collection (class) of queries. We prove that the VC-dimension is a function of the maximum number of Boolean operations in the selection predicate, and of the maximum number of select and join operations in any individual query in the collection, but it is neither a function of the number of queries in the collection nor of the size of the database. We develop a method based on this result: given a class of queries, it constructs a concise random sample of a database, such that with high probability the execution of any query in the class on the sample provides an accurate estimate for the selectivity of the query on the original large database. The error probability holds simultaneously for the selectivity estimates of all queries in the collection, thus the same sample can be used to evaluate the selectivity of multiple queries, and the sample needs to be refreshed only following major changes in the database. The sample representation computed by our method is typically sufficiently small to be stored in main memory. We present extensive experimental results, validating our theoretical analysis and demonstrating the advantage of our technique when compared to complex selectivity estimation techniques used in PostgreSQL and the Microsoft SQL Server.