ISOMER: Consistent Histogram Construction Using Query Feedback

Authors:
U. Srivastava;P. J. Haas;V. Markl;M. Kutsch;T. M. Tran
Affiliations:
Stanford University;IBM Almaden Research Center;IBM Almaden Research Center;IBM Germany;IBM Silicon Valley Lab
Venue:
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Year:
2006

Citing 0
Cited 34

GORDIAN: efficient and scalable discovery of composite keys

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Consistent selectivity estimation via maximum entropy

The VLDB Journal — The International Journal on Very Large Data Bases
Selectivity estimation by batch-query based histogram and parametric method

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Rk-hist: an r-tree based histogram for multi-dimensional selectivity estimation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Self-tuning database systems: a decade of progress

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Detecting attribute dependencies from query feedback

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Workload-Aware Histograms for Remote Applications

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
A pay-as-you-go framework for query execution feedback

Proceedings of the VLDB Endowment
Multiplicative synopses for relative-error metrics

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Query optimizers: time to rethink the contract?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
General Database Statistics Using Entropy Maximization

DBPL '09 Proceedings of the 12th International Symposium on Database Programming Languages
Maintenance strategies for routing indexes

Distributed and Parallel Databases
StatAdvisor: recommending statistical views

Proceedings of the VLDB Endowment
Consistent histograms in the presence of distinct value counts

Proceedings of the VLDB Endowment
A top-down approach for compressing data cubes under the simultaneous evaluation of multiple hierarchical range queries

Journal of Intelligent Information Systems
Understanding cardinality estimation using entropy maximization

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Hierarchically organized skew-tolerant histograms for geographic data objects

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data generation using declarative constraints

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient selectivity estimation by histogram construction based on subspace clustering

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
The VC-dimension of SQL queries and selectivity estimation through sampling

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Comparing data summaries for processing live queries over Linked Data

World Wide Web
Efficient construction of histograms for multidimensional data using quad-trees

Decision Support Systems
Understanding cardinality estimation using entropy maximization

ACM Transactions on Database Systems (TODS)
Worst-case optimal join algorithms: [extended abstract]

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Improving the accuracy of histograms for geographic data objects

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Sensitivity of self-tuning histograms: query order affecting accuracy and robustness

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Histograms as statistical estimators for aggregate queries

Information Systems
STHist-C: a highly accurate cluster-based histogram for two and three dimensional geographic data points

Geoinformatica
Issues in big data testing and benchmarking

Proceedings of the Sixth International Workshop on Testing Database Systems
Designing a database system for modern processing architectures

Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Entropy-based histograms for selectivity estimation

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Statistics collection in oracle spatial and graph: fast histogram construction for complex geometry objects

Proceedings of the VLDB Endowment
Bichromatic buckets: An effective technique to improve the accuracy of histograms for geographic data points

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Database columns are often correlated, so that cardinality estimates computed by assuming independence often lead to a poor choice of query plan by the optimizer. Multidimensional histograms can help solve this problem, but the traditional approach of building such histograms using a data scan often scales poorly and does not always yield the best histogram for a given workload. An attractive alternative is to gather feedback from the query execution engine about the observed cardinality of predicates and use this feedback as the basis for a histogram. In this paper we describe ISOMER, a new feedback-based algorithm for collecting optimizer statistics by constructing and maintaining multidimensional histograms. ISOMER uses the maximumentropy principle to approximate the true data distribution by a histogram distribution that is as "simple"as possible while being consistent with the observed predicate cardinalities. ISOMER adapts readily to changes in the underlying data, automatically detecting and eliminating inconsistent feedback information in an efficient manner. The algorithm controls the size of the histogram by retaining only the most "important" feedback. Our experiments indicate that, unlike previous methods for feedback-driven histogram maintenance, ISOMER imposes little overhead, is extremely scalable, and yields highly accurate cardinality estimates while using only a modest amount of storage.