GORDIAN: efficient and scalable discovery of composite keys
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Consistent selectivity estimation via maximum entropy
The VLDB Journal — The International Journal on Very Large Data Bases
Selectivity estimation by batch-query based histogram and parametric method
ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Rk-hist: an r-tree based histogram for multi-dimensional selectivity estimation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Self-tuning database systems: a decade of progress
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Detecting attribute dependencies from query feedback
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Workload-Aware Histograms for Remote Applications
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
A pay-as-you-go framework for query execution feedback
Proceedings of the VLDB Endowment
Multiplicative synopses for relative-error metrics
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Query optimizers: time to rethink the contract?
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
General Database Statistics Using Entropy Maximization
DBPL '09 Proceedings of the 12th International Symposium on Database Programming Languages
Maintenance strategies for routing indexes
Distributed and Parallel Databases
StatAdvisor: recommending statistical views
Proceedings of the VLDB Endowment
Consistent histograms in the presence of distinct value counts
Proceedings of the VLDB Endowment
Journal of Intelligent Information Systems
Understanding cardinality estimation using entropy maximization
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Hierarchically organized skew-tolerant histograms for geographic data objects
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data generation using declarative constraints
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient selectivity estimation by histogram construction based on subspace clustering
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
The VC-dimension of SQL queries and selectivity estimation through sampling
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Efficient construction of histograms for multidimensional data using quad-trees
Decision Support Systems
Understanding cardinality estimation using entropy maximization
ACM Transactions on Database Systems (TODS)
Worst-case optimal join algorithms: [extended abstract]
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Improving the accuracy of histograms for geographic data objects
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Sensitivity of self-tuning histograms: query order affecting accuracy and robustness
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Histograms as statistical estimators for aggregate queries
Information Systems
Issues in big data testing and benchmarking
Proceedings of the Sixth International Workshop on Testing Database Systems
Designing a database system for modern processing architectures
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Entropy-based histograms for selectivity estimation
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Proceedings of the VLDB Endowment
Data & Knowledge Engineering
Hi-index | 0.00 |
Database columns are often correlated, so that cardinality estimates computed by assuming independence often lead to a poor choice of query plan by the optimizer. Multidimensional histograms can help solve this problem, but the traditional approach of building such histograms using a data scan often scales poorly and does not always yield the best histogram for a given workload. An attractive alternative is to gather feedback from the query execution engine about the observed cardinality of predicates and use this feedback as the basis for a histogram. In this paper we describe ISOMER, a new feedback-based algorithm for collecting optimizer statistics by constructing and maintaining multidimensional histograms. ISOMER uses the maximumentropy principle to approximate the true data distribution by a histogram distribution that is as "simple"as possible while being consistent with the observed predicate cardinalities. ISOMER adapts readily to changes in the underlying data, automatically detecting and eliminating inconsistent feedback information in an efficient manner. The algorithm controls the size of the histogram by retaining only the most "important" feedback. Our experiments indicate that, unlike previous methods for feedback-driven histogram maintenance, ISOMER imposes little overhead, is extremely scalable, and yields highly accurate cardinality estimates while using only a modest amount of storage.