Two-Level Result Caching for Web Search Queries on Structured P2P Networks

  • Authors:
  • Erika Rosas;Nicolas Hidalgo;Mauricio Marin

  • Affiliations:
  • -;-;-

  • Venue:
  • ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a two-level caching strategy for Web search queries which is devised to operate on P2P networks. The aim is to significantly reduce query traffic going from a large community of users to commercial search engines by placing between them a P2P caching service capable of storing and efficiently distributing frequent queries among users. The proposed design takes into consideration the highly dynamic nature of user queries both in traffic intensity and drastic shifts in user interest which are both usually driven by unpredictable world-wide events. Each peer maintains a LRU result cache (RCache) used to keep the answers for queries originated in the peer itself and queries for which the peer is responsible for by contacting on-demand a Web search engine to get the query answers. When query traffic is predominantly routed to a few responsible peers our strategy replicates the role of ``being responsible for" to neighboring peers so that they can absorb part of the traffic to restore load balance. This is a fairly slow and adaptive process that we call mid-term load balancing. To achieve a short-term fair distribution of queries we introduce in each peer a location cache (LCache) which keeps pointers to peers that have already requested the same queries in the very recent past. This lets these peers share their query answers with newly requesting peers. This process is fast as these popular queries are usually cached in the first DHT hop of a requesting peer which quickly tends to redistribute load among more and more peers. A comparative study shows that the proposed strategy achieves better load balance, significantly smaller communication volume among peers, and larger cache hit ratios than previous strategies.