Algorithms for processing K-closest-pair queries in spatial databases

  • Authors:
  • A. Corral;Y. Manolopoulos;Y. Theodoridis;M. Vassilakopoulos

  • Affiliations:
  • Department of Languages and Computation, University of Almeria, 04120 Almeria, Spain;Department of Informatics, Aristotle University of Thessaloniki, 54006 Thessaloniki, Greece;Department of Informatics, University of Piraeus, 18534 Piraeus, Greece;Department of Informatics, Technological Educational Institute of Thessaloniki, P.O. Box 14561, 54101 Thessaloniki, Greece

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the problem of finding the K closest pairs between two spatial datasets (the so-called, K closest pairs query, K-CPQ), where each dataset is stored in an R-tree. There are two different techniques for solving this kind of distance-based query. The first technique is the incremental approach, which returns the output elements one-by-one in ascending order of distance. The second one is the nonincremental alternative, which returns the K elements of the result all together at the end of the algorithm. In this paper, based on distance functions between two MBRs in the multidimensional Euclidean space, we propose a pruning heuristic and two updating strategies for minimizing the pruning distance, and use them in the design of three non-incremental branch-and-bound algorithms for K-CPQ between spatial objects stored in two R-trees. Two of those approaches are recursive following a Depth-First searching strategy and one is iterative obeying a Best-First traversal policy. The plane-sweep method and the search ordering are used as optimization techniques for improving the naive approaches. Besides, a number of interesting extensions of the K-CPQ (K-Self-CPQ, Semi-CPQ, K-FPQ (the K-farthest pairs query), etc.) are discussed. An extensive performance study is also presented. This study is based on experiments performed with real datasets. A wide range of values for the basic parameters affecting the performance of the algorithms is examined in order to designate the most efficient algorithm for each setting of parameter values. Finally, an experimental study of the behavior of the proposed K-CPQ branch-and-bound algorithms in terms of scalability of the dataset size and the K value is also included.