Flood little, cache more: effective result-reuse in P2P IR systems

  • Authors:
  • Christian Zimmer;Srikanta Bedathur;Gerhard Weikum

  • Affiliations:
  • Department for Databases and Information Systems, Max-Planck-Institute for Informatics, Saarbrcken, Germany;Department for Databases and Information Systems, Max-Planck-Institute for Informatics, Saarbrcken, Germany;Department for Databases and Information Systems, Max-Planck-Institute for Informatics, Saarbrcken, Germany

  • Venue:
  • DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

State-of-the-art Peer-to-Peer Information Retrieval (P2P IR) systems suffer from their lack of response time guarantee especially with scale. To address this issue, a number of techniques for caching of multi-term inverted list intersections and query results have been proposed recently. Although these enable speedy query evaluations with low network overheads, they fail to consider the potential impact of caching on result quality improvements. In this paper, we propose the use of a cache-aware query routing scheme, that not only reduces the response delays for a query, but also presents an opportunity to improve the result quality while keeping the network usage low. In this regard, we make threefold contributions in this paper. First of all, we develop a cache-aware, multiround query routing strategy that balances between query efficiency and result-quality. Next, we propose to aggressively reuse the cached results of even subsets of a query towards an approximate caching technique that can drastically reduce the bandwidth overheads, and study the conditions under which such a scheme can retain good result-quality. Finally, we empirically evaluate these techniques over a fully functional P2P IR system, using a large-scale Wikipedia benchmark, and using both synthetic and real-world query workloads. Our results show that our proposal to combine result caching with multi-round, cache-aware query routing can reduce network traffic by more than half while doubling the result quality.