Cost models for distance joins queries using R-trees

  • Authors:
  • Antonio Corral;Yannis Manolopoulos;Yannis Theodoridis;Michael Vassilakopoulos

  • Affiliations:
  • Department of Languages and Computation, University of Almeria, Almeria, Spain;Department of Informatics, Aristotle University, Thessaloniki, Greece;Department of Informatics, University of Piraeus, Piraeus, Greece;Department of Informatics, Technological Educational Institute of Thessaloniki, Thessaloniki, Greece

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The K-Closest-Pairs Query (K-CPQ), a type of distance join in spatial databases, discovers the K pairs of objects formed from two different datasets with the K smallest distances. Recently, branch-and-bound algorithms based on R-trees have been developed in order to answer K-CPQs efficiently. For query optimization purposes, analytical models are needed to estimate the processing cost of a specific query in order to evaluate alternative execution plans. In this paper, we combine techniques that have been used for the analysis of nearest neighbor and spatial join queries, and derive the performance cost (in terms of disk accesses) of K-CPQs using R-trees. Moreover, we present two interesting extensions of the cost model for K-CPQs, one exploiting the buffering management using R-trees and another for a second type of distance join, the so-called buffer queries. The proposed cost models are verified under a variety of distributions in 2-dimensional space on both synthetic and real datasets, shown to achieve accurate estimations of the measured experimental results.