Query-driven indexing for scalable peer-to-peer text retrieval

Authors:
Gleb Skobeltsyn;Toan Luu;Ivana Podnar Žarko;Martin Rajman;Karl Aberer
Affiliations:
Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland;Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland;University of Zagreb, Zagreb, Croatia;Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland;Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Venue:
Proceedings of the 2nd international conference on Scalable information systems
Year:
2007

Citing 13
Cited 8

Information Retrieval: Computational and Theoretical Aspects

Information Retrieval: Computational and Theoretical Aspects
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Improving collection selection with overlap awareness in P2P search engines

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Range Queries in Trie-Structured Overlays

P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment

P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Congestion Control for Distributed Hash Tables

NCA '06 Proceedings of the Fifth IEEE International Symposium on Network Computing and Applications
Distributed cache table: efficient query-driven processing of multi-term queries in P2P networks

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Query-driven indexing for peer-to-peer text retrieval

Proceedings of the 16th international conference on World Wide Web
Hybrid global-local indexing for effcient peer-to-peer information retrieval

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Efficient peer-to-peer keyword searching

Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
DL meets p2p – distributed document retrieval based on classification and content

ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Federated search of text-based digital libraries in hierarchical peer-to-peer networks

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Web text retrieval with a P2P query-driven index

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Query-driven indexing for scalable peer-to-peer text retrieval

Future Generation Computer Systems
AlvisP2P: scalable peer-to-peer text retrieval in a structured P2P network

Proceedings of the VLDB Endowment
Peer-to-peer similarity search over widely distributed document collections

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Aggregation of Document Frequencies in Unstructured P2P Networks

WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Trustworthy acquaintances in Peer-to-Peer (P2P) overlay networks

International Journal of Business Intelligence and Data Mining
A hybrid approach for estimating document frequencies in unstructured P2P networks

Information Systems
Design and evaluation of algorithms for obtaining objective trustworthiness on acquaintances in P2P overlay networks

International Journal of Grid and Utility Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a query-driven algorithm for the distributed indexing of large document collections within structured P2P networks. To cope with bandwidth consumption that has been identified as the major problem for the standard P2P approach with single term indexing, we leverage a distributed index that stores up to top-k document references only for carefully chosen indexing term combinations. In addition, since the number of possible term combinations extracted from a document collection can be very large, we propose to use query statistics to index only such combinations that are indeed frequently requested by the users. Thus, by avoiding the maintenance of superfluous indexing information, we achieve a substantial reduction in bandwidth and storage. A specific activation mechanism is applied to continuously update the indexing information according to changes in the query distribution, resulting in an efficient, constantly evolving query-driven indexing structure. We show that the size of the index and the generated indexing/retrieval traffic remains manageable even for web-size document collections at the price of a marginal loss in precision for rare queries. Our theoretical analysis and experimental results provide convincing evidence about the feasibility of the query-driven indexing strategy for large scale P2P text retrieval. Moreover, our experiments confirm that the retrieval performance is only slightly lower than the one obtained with state-of-the-art centralized query engines.