Utilizing histogram information

  • Authors:
  • Hai Wang;Ken Sevcik

  • Affiliations:
  • Department of Computer Science, University of Toronto;Department of Computer Science, University of Toronto

  • Venue:
  • CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many commercial database management systems (e.g., DB2, Oracle, etc.) make use of histograms of the value distribution of individual attributes of relations in order to make good selections of query execution plans. These histograms contain partial information about the actual distribution, such as which attribute values occur most frequently, and how often each one occurs, and what value occurs at the kth quantile when the values are sorted.In this paper, we quantitatively assess the information gain (or uncertainty reduction) due to each of these types of histogram information both individually and in combination. Correspondingly, we observe how the accuracy of estimating frequencies of individual values improves with the availability of each of these types of histogram information. We suggest guidelines for constructing histograms tailored to each individual attribute depending on the characteristics of its attribute value distribution.