Rk-hist: an r-tree based histogram for multi-dimensional selectivity estimation

  • Authors:
  • Todd Eavis;Alex Lopez

  • Affiliations:
  • Concordia University, Montreal, PQ, Canada;Concordia University, Montreal, PQ, Canada

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Database query engines typically rely upon query size estimators in order to evaluate the potential cost of alternate query plans. In multi-dimensional database systems, such as those typically found in large data warehousing environments, these selectivity estimators often take the form of multi-dimensional histograms. But while single dimensional histograms have proven to be quite accurate, even in the presence of data skew, the multi-dimensional variations have generally been far less reliable. In this paper, we present a new histogram model that is based upon an r-tree space partitioning. The localization of the r-tree boxes is in turn controlled by a Hilbert space filling curve, while a series of efficient area equalization heuristics restructures the initial boxes to provide improved bucket representation. Experimental results demonstrate significantly improved estimation accuracy relative to state of the art alternatives, as well as superior consistency across a variety of record distributions.