Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Improving collection selection with overlap awareness in P2P search engines
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Range Queries in Trie-Structured Overlays
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Congestion Control for Distributed Hash Tables
NCA '06 Proceedings of the Fifth IEEE International Symposium on Network Computing and Applications
Distributed cache table: efficient query-driven processing of multi-term queries in P2P networks
P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Query-driven indexing for peer-to-peer text retrieval
Proceedings of the 16th international conference on World Wide Web
Hybrid global-local indexing for effcient peer-to-peer information retrieval
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Efficient peer-to-peer keyword searching
Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
DL meets p2p – distributed document retrieval based on classification and content
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Federated search of text-based digital libraries in hierarchical peer-to-peer networks
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Web text retrieval with a P2P query-driven index
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Query-driven indexing for scalable peer-to-peer text retrieval
Future Generation Computer Systems
AlvisP2P: scalable peer-to-peer text retrieval in a structured P2P network
Proceedings of the VLDB Endowment
Peer-to-peer similarity search over widely distributed document collections
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Aggregation of Document Frequencies in Unstructured P2P Networks
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Trustworthy acquaintances in Peer-to-Peer (P2P) overlay networks
International Journal of Business Intelligence and Data Mining
International Journal of Grid and Utility Computing
Hi-index | 0.00 |
We present a query-driven algorithm for the distributed indexing of large document collections within structured P2P networks. To cope with bandwidth consumption that has been identified as the major problem for the standard P2P approach with single term indexing, we leverage a distributed index that stores up to top-k document references only for carefully chosen indexing term combinations. In addition, since the number of possible term combinations extracted from a document collection can be very large, we propose to use query statistics to index only such combinations that are indeed frequently requested by the users. Thus, by avoiding the maintenance of superfluous indexing information, we achieve a substantial reduction in bandwidth and storage. A specific activation mechanism is applied to continuously update the indexing information according to changes in the query distribution, resulting in an efficient, constantly evolving query-driven indexing structure. We show that the size of the index and the generated indexing/retrieval traffic remains manageable even for web-size document collections at the price of a marginal loss in precision for rare queries. Our theoretical analysis and experimental results provide convincing evidence about the feasibility of the query-driven indexing strategy for large scale P2P text retrieval. Moreover, our experiments confirm that the retrieval performance is only slightly lower than the one obtained with state-of-the-art centralized query engines.