Processing Distance Join Queries with Constraints

  • Authors:
  • Apostolos N. Papadopoulos;Alexandros Nanopoulos;Yannis Manolopoulos

  • Affiliations:
  • *Corresponding author: apostol@delab.csd.auth.gr;Data Engineering Research Laboratory, Department of Informatics, Aristotle University Thessaloniki 54124, Greece;Data Engineering Research Laboratory, Department of Informatics, Aristotle University Thessaloniki 54124, Greece

  • Venue:
  • The Computer Journal
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distance join queries are used in many modern applications, such as spatial databases, spatiotemporal databases and data mining. One of the most common distance join queries is the closest-pair query (CPQ). Given two datasets $${\mathcal{D}}_{\mathcal{A}}$$ and $${\mathcal{D}}_{\mathcal{B}}$$ the CPQ retrieves the pair (a, b), where a ∈ $${\mathcal{D}}_{\mathcal{A}}$$ and b ∈ $${\mathcal{D}}_{\mathcal{B}}$$, having the smallest distance between all pairs of objects. An extension to this problem is to generate the k closest pairs of objects (k-CPQ). In several cases spatial constraints are applied, and object pairs that are retrieved must also satisfy these constraints. Although the application of spatial constraints seems natural towards a more focused search, only recently they have been studied for the CPQ problem with the restriction that $${\mathcal{D}}_{\mathcal{A}}$$ = $${\mathcal{D}}_{\mathcal{B}}$$. In this work, we focus on constrained closest-pair queries, between two distinct datasets $${\mathcal{D}}_{\mathcal{A}}$$ and $${\mathcal{D}}_{\mathcal{B}}$$, where objects from $${\mathcal{D}}_{\mathcal{A}}$$ must be enclosed by a spatial region R. Several algorithms are presented and evaluated using real-life and synthetic datasets. Among them, a heap-based method enhanced with batch capabilities outperforms the other approaches as it is demonstrated by an extensive performance evaluation.