Flood little, cache more: effective result-reuse in P2P IR systems

Authors:
Christian Zimmer;Srikanta Bedathur;Gerhard Weikum
Affiliations:
Department for Databases and Information Systems, Max-Planck-Institute for Informatics, Saarbrcken, Germany;Department for Databases and Information Systems, Max-Planck-Institute for Informatics, Saarbrcken, Germany;Department for Databases and Information Systems, Max-Planck-Institute for Informatics, Saarbrcken, Germany
Venue:
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Year:
2008

Citing 17
Cited 3

Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Building efficient and effective metasearch engines

ACM Computing Surveys (CSUR)
A language modeling framework for resource selection and results merging

Proceedings of the eleventh international conference on Information and knowledge management
Routing Indices For Peer-to-Peer Systems

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Distributed Caching and Adaptive Search in Multilayer P2P Networks

ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
Competitive caching of query results in search engines

Theoretical Computer Science - Special issue: Online algorithms in memoriam, Steve Seiden
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Improving collection selection with overlap awareness in P2P search engines

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
MINERVA: collaborative P2P search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment

P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Distributed cache table: efficient query-driven processing of multi-term queries in P2P networks

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
ALVIS peers: a scalable full-text peer-to-peer retrieval engine

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Querying the internet with PIER

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A decision-theoretic model for decentralised query routing in hierarchical peer-to-peer networks

ECIR'07 Proceedings of the 29th European conference on IR research

EverLast: a distributed architecture for preserving the web

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Peer-to-Peer Information Retrieval: An Overview

ACM Transactions on Information Systems (TOIS)
Cache-Based Query Processing for Search Engines

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

State-of-the-art Peer-to-Peer Information Retrieval (P2P IR) systems suffer from their lack of response time guarantee especially with scale. To address this issue, a number of techniques for caching of multi-term inverted list intersections and query results have been proposed recently. Although these enable speedy query evaluations with low network overheads, they fail to consider the potential impact of caching on result quality improvements. In this paper, we propose the use of a cache-aware query routing scheme, that not only reduces the response delays for a query, but also presents an opportunity to improve the result quality while keeping the network usage low. In this regard, we make threefold contributions in this paper. First of all, we develop a cache-aware, multiround query routing strategy that balances between query efficiency and result-quality. Next, we propose to aggressively reuse the cached results of even subsets of a query towards an approximate caching technique that can drastically reduce the bandwidth overheads, and study the conditions under which such a scheme can retain good result-quality. Finally, we empirically evaluate these techniques over a fully functional P2P IR system, using a large-scale Wikipedia benchmark, and using both synthetic and real-world query workloads. Our results show that our proposal to combine result caching with multi-round, cache-aware query routing can reduce network traffic by more than half while doubling the result quality.