Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
On scaling latent semantic indexing for large peer-to-peer systems
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient top-K query calculation in distributed networks
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Improving collection selection with overlap awareness in P2P search engines
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
The Essence of P2P: A Reference Architecture for Overlay Networks
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
User modeling for full-text federated search in peer-to-peer networks
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Congestion Control for Distributed Hash Tables
NCA '06 Proceedings of the Fifth IEEE International Symposium on Network Computing and Applications
Distributed cache table: efficient query-driven processing of multi-term queries in P2P networks
P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Query-driven indexing for peer-to-peer text retrieval
Proceedings of the 16th international conference on World Wide Web
A pipelined architecture for distributed text query evaluation
Information Retrieval
The impact of caching on search engines
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Web text retrieval with a P2P query-driven index
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
SemreX: Efficient search in a semantic overlay for literature retrieval
Future Generation Computer Systems
Query-driven indexing for scalable peer-to-peer text retrieval
Proceedings of the 2nd international conference on Scalable information systems
Mining query logs to optimize index partitioning in parallel web search engines
Proceedings of the 2nd international conference on Scalable information systems
DL meets p2p – distributed document retrieval based on classification and content
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Federated search of text-based digital libraries in hierarchical peer-to-peer networks
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Editorial: Special Section: Scalable information systems
Future Generation Computer Systems
Managing collaborative feedback information for distributed retrieval
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing
Computer Networks: The International Journal of Computer and Telecommunications Networking
KMV-peer: a robust and adaptive peer-selection algorithm
Proceedings of the fourth ACM international conference on Web search and data mining
Peer-to-Peer Information Retrieval: An Overview
ACM Transactions on Information Systems (TOIS)
Distributed search based on self-indexed compressed text
Information Processing and Management: an International Journal
The state of peer-to-peer network simulators
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations that are frequently present in user queries, and (2) the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. These two properties guarantee acceptable latency and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the posting lists transmitted during retrieval never exceed a constant size. A novel index update mechanism efficiently handles adding of new documents to the document collection. Thus, the generated distributed index corresponds to a constantly evolving query-driven indexing structure that efficiently follows current information needs of the users and changes in the document collection. We show that the size of the index and the generated indexing/retrieval traffic remains manageable even for Web-size document collections at the price of a marginal loss in precision for rare queries. Our theoretical analysis and experimental results provide convincing evidence about the feasibility of the query-driven indexing strategy for large scale P2P text retrieval.