Processing Distance Join Queries with Constraints

Authors:
Apostolos N. Papadopoulos;Alexandros Nanopoulos;Yannis Manolopoulos
Affiliations:
*Corresponding author: apostol@delab.csd.auth.gr;Data Engineering Research Laboratory, Department of Informatics, Aristotle University Thessaloniki 54124, Greece;Data Engineering Research Laboratory, Department of Informatics, Aristotle University Thessaloniki 54124, Greece
Venue:
The Computer Journal
Year:
2006

Citing 0
Cited 4

A performance comparison of distance-based query algorithms using R-trees in spatial databases

Information Sciences: an International Journal
Proximity rank join

Proceedings of the VLDB Endowment
Proximity rank join in search computing

Search computing
Proximity measures for rank join

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distance join queries are used in many modern applications, such as spatial databases, spatiotemporal databases and data mining. One of the most common distance join queries is the closest-pair query (CPQ). Given two datasets $${\mathcal{D}}_{\mathcal{A}}$$ and $${\mathcal{D}}_{\mathcal{B}}$$ the CPQ retrieves the pair (a, b), where a ∈ $${\mathcal{D}}_{\mathcal{A}}$$ and b ∈ $${\mathcal{D}}_{\mathcal{B}}$$, having the smallest distance between all pairs of objects. An extension to this problem is to generate the k closest pairs of objects (k-CPQ). In several cases spatial constraints are applied, and object pairs that are retrieved must also satisfy these constraints. Although the application of spatial constraints seems natural towards a more focused search, only recently they have been studied for the CPQ problem with the restriction that $${\mathcal{D}}_{\mathcal{A}}$$ = $${\mathcal{D}}_{\mathcal{B}}$$. In this work, we focus on constrained closest-pair queries, between two distinct datasets $${\mathcal{D}}_{\mathcal{A}}$$ and $${\mathcal{D}}_{\mathcal{B}}$$, where objects from $${\mathcal{D}}_{\mathcal{A}}$$ must be enclosed by a spatial region R. Several algorithms are presented and evaluated using real-life and synthetic datasets. Among them, a heap-based method enhanced with batch capabilities outperforms the other approaches as it is demonstrated by an extensive performance evaluation.