Selectivity estimation of high dimensional window queries via clustering

  • Authors:
  • Christian Böhm;Hans-Peter Kriegel;Peer Kröger;Petra Linhart

  • Affiliations:
  • Institute for Computer Science, University of Munich, Germany;Institute for Computer Science, University of Munich, Germany;Institute for Computer Science, University of Munich, Germany;Institute for Computer Science, University of Munich, Germany

  • Venue:
  • SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Query optimization is an important functionality of modern database systems and often based on estimating the selectivity of queries before actually executing them. Well-known techniques for estimating the result set size of a query are sampling and histogram-based solutions. Sampling-based approaches heavily depend on the size of the drawn sample which causes a trade-off between the quality of the estimation and the time in which the estimation can be executed for large data sets. Histogram-based techniques eliminate this problem but are limited to low-dimensional data sets. They either assume that all attributes are independent which is rarely true for real-world data or else get very inefficient for high-dimensional data. In this paper we present the first multivariate parametric method for estimating the selectivity of window queries for large and high-dimensional data sets. We use clustering to compress the data by generating a precise model of the data using multivariate Gaussian distributions. Additionally, we show efficient techniques to evaluate a window query against the Gaussian distributions we generated. Our experimental evaluation shows that this approach is significantly more efficient for multidimensional data than all previous approaches.