A performance comparison of distance-based query algorithms using R-trees in spatial databases
Information Sciences: an International Journal
Proceedings of the VLDB Endowment
Proximity rank join in search computing
Search computing
Proximity measures for rank join
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
Distance join queries are used in many modern applications, such as spatial databases, spatiotemporal databases and data mining. One of the most common distance join queries is the closest-pair query (CPQ). Given two datasets $${\mathcal{D}}_{\mathcal{A}}$$ and $${\mathcal{D}}_{\mathcal{B}}$$ the CPQ retrieves the pair (a, b), where a ∈ $${\mathcal{D}}_{\mathcal{A}}$$ and b ∈ $${\mathcal{D}}_{\mathcal{B}}$$, having the smallest distance between all pairs of objects. An extension to this problem is to generate the k closest pairs of objects (k-CPQ). In several cases spatial constraints are applied, and object pairs that are retrieved must also satisfy these constraints. Although the application of spatial constraints seems natural towards a more focused search, only recently they have been studied for the CPQ problem with the restriction that $${\mathcal{D}}_{\mathcal{A}}$$ = $${\mathcal{D}}_{\mathcal{B}}$$. In this work, we focus on constrained closest-pair queries, between two distinct datasets $${\mathcal{D}}_{\mathcal{A}}$$ and $${\mathcal{D}}_{\mathcal{B}}$$, where objects from $${\mathcal{D}}_{\mathcal{A}}$$ must be enclosed by a spatial region R. Several algorithms are presented and evaluated using real-life and synthetic datasets. Among them, a heap-based method enhanced with batch capabilities outperforms the other approaches as it is demonstrated by an extensive performance evaluation.