Distributed Processing of Similarity Queries

  • Authors:
  • Apostolos N. Papadopoulos;Yannis Manolopoulos

  • Affiliations:
  • Data Engineering Research Lab., Department of Informatics, Aristotle University, 54006 Thessaloniki, Greece. apostol@delab.csd.auth.gr;Data Engineering Research Lab., Department of Informatics, Aristotle University, 54006 Thessaloniki, Greece. yannis@delab.csd.auth.gr

  • Venue:
  • Distributed and Parallel Databases
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many modern applications in diverse fields demand the efficient manipulation of very large multidimensional datasets. It is evident, that efficient and effective query processing techniques need to be developed, in order to provide acceptable response times in query processing. In this paper, we study the processing of similarity nearest neighbor queries in large distributed multidimensional databases, where objects are represented as vectors in a vector space, and are distributed in a multi-computer environment. The departure from the centralized case embodies a number of advantages and (unfortunately) a number of difficulties that need to be successfully overcome. In this perspective, four query evaluation strategies are presented, namely Concurrent Processing (CP), Selective Processing (SP), Two-Phase Processing (2PP) and Probabilistic Processing (PRP). The proposed techniques are compared analytically and experimentally, in order to discover the advantages of each one, as well as the best cases where each one should be applied. Experimental results are presented, demonstrating the performance of each method under different parameters values. Also, we investigate the impact of derived data that should be maintained in order to process similarity queries efficiently.