Efficient range query processing in metric spaces over highly distributed data

  • Authors:
  • Christos Doulkeridis;Akrivi Vlachou;Yannis Kotidis;Michalis Vazirgiannis

  • Affiliations:
  • Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway;Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway;Department of Informatics, Athens University of Economics and Business, Athens, Greece;Department of Informatics, Athens University of Economics and Business, Athens, Greece

  • Venue:
  • Distributed and Parallel Databases
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Similarity search in P2P systems has attracted a lot of attention recently and several important applications, like distributed image search, can profit from the proposed distributed algorithms. In this paper, we address the challenging problem of efficient processing of range queries in metric spaces, where data is horizontally distributed across a super-peer network. Our approach relies on SIMPEER (Doulkeridis et al. in Proceedings of VLDB, pp. 986---997, 2007), a framework that dynamically clusters peer data, in order to build distributed routing information at super-peer level. SIMPEER allows the evaluation of exact range and nearest neighbor queries in a distributed manner that reduces communication cost, network latency, bandwidth consumption and computational overhead at each individual peer. In this paper, we extend SIMPEER by focusing on efficient range query processing and providing recall-based guarantees for the quality of the result retrieved so far. This is especially useful for range queries that lead to result sets of high cardinality and incur high processing costs, while the complete result set becomes overwhelming for the user. Our framework employs statistics for estimating an upper limit of the number of possible results for a range query and each super-peer may decide not to propagate further the query and reduce the scope of the search. We provide an experimental evaluation of our framework and show that our approach performs efficiently, even in the case of high degree of distribution.