Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Distributed cache table: efficient query-driven processing of multi-term queries in P2P networks
P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Web text retrieval with a P2P query-driven index
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Query-driven indexing for scalable peer-to-peer text retrieval
Proceedings of the 2nd international conference on Scalable information systems
Query-driven indexing for scalable peer-to-peer text retrieval
Future Generation Computer Systems
Peer-to-Peer Information Retrieval: An Overview
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
We describe a query-driven indexing framework for scalable text retrieval over structured P2P networks. To cope with the bandwidth consumption problem that has been identified as the major obstacle for full-text retrieval in P2P networks, we truncate posting lists associated with indexing features to a constant size storing only top-k ranked document references. To compensate for the loss of information caused by the truncation, we extend the set of indexing features with carefully chosen term sets. Indexing term sets are selected based on the query statistics extracted from query logs, thus we index only such combinations that are a) frequently present in user queries and b) non-redundant w.r.t the rest of the index. The distributed index is compact and efficient as it constantly evolves adapting to the current query popularity distribution. Moreover, it is possible to control the tradeoff between the storage/bandwidth requirements and the quality of query answering by tuning the indexing parameters. Our theoretical analysis and experimental results indicate that we can indeed achieve scalable P2P text retrieval for very large document collections and deliver good retrieval performance.