Efficient peer-to-peer semantic overlay networks based on statistical language models

  • Authors:
  • Alessandro Linari;Gerhard Weikum

  • Affiliations:
  • University of Bologna, Bologna, Italy;Max-Planck Institute for Informatics, Saarbrücken, Germany

  • Venue:
  • P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we address the query routing problem in peer-to-peer (P2P) information retrieval. Our system builds up on the idea of a Semantic Overlay Network (SON), in which each peer becomes neighbor of a small number of peers, chosen among those that are most similar to it. Peers in the network are represented by a statistical Language Model derived from their local data collections but, instead of using the non-metric Kullback-Leibler divergence to compute the similarity between them, we use a symmetrized and "metricized" related measure, the square root of the Jensen-Shannon divergence, which let us map the problem to a metric search problem. The search strategy exploits the triangular inequality to efficiently prune the search space and relies on a priority queue to visit the most promising peers first. To keep communications costs low and to perform an efficient comparison between Language Models, we devise a compression technique that builds on Bloom-filters and histograms and we provide error bounds for the approximation and a cost analysis for the algorithms used to build and maintain the SON.