On the propagation of errors in the size of join results
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Multiple join size estimation by virtual domains (extended abstract)
PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On the estimation of join result sizes
EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Self-tuning histograms: building histograms without looking at data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The maximum entropy approach and probabilistic IR models
ACM Transactions on Information Systems (TOIS)
Independence is good: dependency-based histogram synopses for high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Exploiting statistics on query expressions for optimization
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Correction to 'Automating Statistics Management for Query Optimizers'
IEEE Transactions on Knowledge and Data Engineering
Sampling-Based Selectivity Estimation for Joins Using Augmented Frequent Value Statistics
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
LEO - DB2's LEarning Optimizer
Proceedings of the 27th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Oracle Database 10g New Features: Oracle10g Reference for Advanced Tuning and Administration
Oracle Database 10g New Features: Oracle10g Reference for Advanced Tuning and Administration
Conditional selectivity for statistics on query expressions
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
CORDS: automatic discovery of correlations and soft functional dependencies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Refined lexicon models for statistical machine translation using a maximum entropy approach
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Consistently estimating the selectivity of conjuncts of predicates
VLDB '05 Proceedings of the 31st international conference on Very large data bases
ISOMER: Consistent Histogram Construction Using Query Feedback
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Automated statistics collection in DB2 UDB
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Integrating a maximum-entropy cardinality estimator into DB2 UDB
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Brighthouse: an analytic data warehouse for ad-hoc queries
Proceedings of the VLDB Endowment
Uncertainty management in rule-based information extraction systems
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Automated SQL tuning through trial and (sometimes) error
Proceedings of the Second International Workshop on Testing Database Systems
Statistical structures for Internet-scale data management
The VLDB Journal — The International Journal on Very Large Data Bases
Structured annotations of web queries
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Link gain matrix estimation in distributed large-scale wireless networks
EURASIP Journal on Wireless Communications and Networking - Special issue on simulators and experimental testbeds design and development for wireless networks
Xplus: a SQL-tuning-aware query optimizer
Proceedings of the VLDB Endowment
ACM Transactions on Database Systems (TODS)
The VC-dimension of SQL queries and selectivity estimation through sampling
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Efficiently adapting graphical models for selectivity estimation
The VLDB Journal — The International Journal on Very Large Data Bases
Entropy-based histograms for selectivity estimation
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Cost-based query optimizers need to estimate the selectivity of conjunctive predicates when comparing alternative query execution plans. To this end, advanced optimizers use multivariate statistics to improve information about the joint distribution of attribute values in a table. The joint distribution for all columns is almost always too large to store completely, and the resulting use of partial distribution information raises the possibility that multiple, non-equivalent selectivity estimates may be available for a given predicate. Current optimizers use cumbersome ad hoc methods to ensure that selectivities are estimated in a consistent manner. These methods ignore valuable information and tend to bias the optimizer toward query plans for which the least information is available, often yielding poor results. In this paper we present a novel method for consistent selectivity estimation based on the principle of maximum entropy (ME). Our method exploits all available information and avoids the bias problem. In the absence of detailed knowledge, the ME approach reduces to standard uniformity and independence assumptions. Experiments with our prototype implementation in DB2 UDB show that use of the ME approach can improve the optimizer’s cardinality estimates by orders of magnitude, resulting in better plan quality and significantly reduced query execution times. For almost all queries, these improvements are obtained while adding only tens of milliseconds to the overall time required for query optimization.