Utilizing histogram information

Authors:
Hai Wang;Ken Sevcik
Affiliations:
Department of Computer Science, University of Toronto;Department of Computer Science, University of Toronto
Venue:
CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Year:
2001

Citing 11
Cited 3

Optimal histograms for limiting worst-case error propagation in the size of join results

ACM Transactions on Database Systems (TODS)
Balancing histogram optimality and practicality for query result size estimation

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Optimal Histograms with Quality Guarantees

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Universality of Serial Histograms

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases

A multi-dimensional histogram for selectivity estimation and fast approximate query answering

CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
Structure choices for two-dimensional histogram construction

CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many commercial database management systems (e.g., DB2, Oracle, etc.) make use of histograms of the value distribution of individual attributes of relations in order to make good selections of query execution plans. These histograms contain partial information about the actual distribution, such as which attribute values occur most frequently, and how often each one occurs, and what value occurs at the kth quantile when the values are sorted.In this paper, we quantitatively assess the information gain (or uncertainty reduction) due to each of these types of histogram information both individually and in combination. Correspondingly, we observe how the accuracy of estimating frequencies of individual values improves with the availability of each of these types of histogram information. We suggest guidelines for constructing histograms tailored to each individual attribute depending on the characteristics of its attribute value distribution.