Query Result Size Estimation Using a Novel Histogram-like Technique: The Rectangular Attribute Cardinality Map

  • Authors:
  • B. John Oommen;Murali Thiyagarajah

  • Affiliations:
  • -;-

  • Venue:
  • IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current database systems utilize histograms to approximate frequency distributions of attribute values of relations. These are used to efficiently estimate query result sizes and access plan costs. Even though they have been in use for nearly two decades, there has been no significant mathematical techniques (other than those used in statistics for traditional histogram approximations) to study them. In this paper, we introduce a new histogram-like approximation strategy, called the Rectangular Attribute Cardinality Map (R-ACM), that aims to approximate the density of the underlying attribute values using the philosophies of numerical integration.In this new histogram-like approximation method, the density function within a given sector is approximated by a rectangular cell, where the height of the cell is obtained so as to guarantee that the actual probability density differs from the approximated one by a maximum of a user-specified tolerance, _ . Furthermore, unlike the two traditional histogram types, namely equi-width and equi-depth, the R-ACM is neither equi-width nor equi-depth. Analytically, we show that for the R-ACM, the distribution of an attribute value within the sector is Binomially distributed.This permits us to derive worst-case and average-case results for the estimation errors of the probability mass itself. Our theoretical results, which include a rigorous maximum likelihood and expected-case analyses, and an extensive set of experiments demonstrate that the R-ACM scheme (which is essentially histogram-like) is much more accurate than the traditional histograms for query result size estimation. Due to its high accuracy and low construction costs, we hope that it could become an invaluable tool for query optimization in the future database systems.