On the propagation of errors in the size of join results
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Multiple join size estimation by virtual domains (extended abstract)
PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On the estimation of join result sizes
EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Self-tuning histograms: building histograms without looking at data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The maximum entropy approach and probabilistic IR models
ACM Transactions on Information Systems (TOIS)
Independence is good: dependency-based histogram synopses for high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Exploiting statistics on query expressions for optimization
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Sampling-Based Selectivity Estimation for Joins Using Augmented Frequent Value Statistics
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Probabilistic Optimization of Top N Queries
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
LEO - DB2's LEarning Optimizer
Proceedings of the 27th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Automating Statistics Management for Query Optimizers
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Oracle Database 10g New Features: Oracle10g Reference for Advanced Tuning and Administration
Oracle Database 10g New Features: Oracle10g Reference for Advanced Tuning and Administration
Conditional selectivity for statistics on query expressions
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
CORDS: automatic discovery of correlations and soft functional dependencies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Refined lexicon models for statistical machine translation using a maximum entropy approach
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Automated statistics collection in DB2 UDB
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Towards estimating the number of distinct value combinations for a set of attributes
Proceedings of the 14th ACM international conference on Information and knowledge management
Towards correcting input data errors probabilistically using integrity constraints
MobiDE '06 Proceedings of the 5th ACM international workshop on Data engineering for wireless and mobile access
MAXENT: consistent cardinality estimation in action
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Consistent selectivity estimation via maximum entropy
The VLDB Journal — The International Journal on Very Large Data Bases
Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Exploiting correlated keywords to improve approximate information filtering
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A new approach to building histogram for selectivity estimation in query processing optimization
Computers & Mathematics with Applications
Query optimizers: time to rethink the contract?
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
General Database Statistics Using Entropy Maximization
DBPL '09 Proceedings of the 12th International Symposium on Database Programming Languages
Measure-driven keyword-query expansion
Proceedings of the VLDB Endowment
Consistent histograms in the presence of distinct value counts
Proceedings of the VLDB Endowment
Understanding cardinality estimation using entropy maximization
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Understanding cardinality estimation using entropy maximization
ACM Transactions on Database Systems (TODS)
HASE: a hybrid approach to selectivity estimation for conjunctive predicates
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Progressive query optimization for federated queries
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Integrating a maximum-entropy cardinality estimator into DB2 UDB
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Worst-case optimal join algorithms: [extended abstract]
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Entropy-based histograms for selectivity estimation
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Cost-based query optimizers need to estimate the selectivity of conjunctive predicates when comparing alternative query execution plans. To this end, advanced optimizers use multivariate statistics (MVS) to improve information about the joint distribution of attribute values in a table. The joint distribution for all columns is almost always too large to store completely, and the resulting use of partial distribution information raises the possibility that multiple, non-equivalent selectivity estimates may be available for a given predicate. Current optimizers use ad hoc methods to ensure that selectivities are estimated in a consistent manner. These methods ignore valuable information and tend to bias the optimizer toward query plans for which the least information is available, often yielding poor results. In this paper we present a novel method for consistent selectivity estimation based on the principle of maximum entropy (ME). Our method efficiently exploits all available information and avoids the bias problem. In the absence of detailed knowledge, the ME approach reduces to standard uniformity and independence assumptions. Our implementation using a prototype version of DB2 UDB shows that ME improves the optimizer's cardinality estimates by orders of magnitude, resulting in better plan quality and significantly reduced query execution times.