Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
On scaling latent semantic indexing for large peer-to-peer systems
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient top-K query calculation in distributed networks
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Improving collection selection with overlap awareness in P2P search engines
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
The Essence of P2P: A Reference Architecture for Overlay Networks
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Congestion Control for Distributed Hash Tables
NCA '06 Proceedings of the Fifth IEEE International Symposium on Network Computing and Applications
Distributed cache table: efficient query-driven processing of multi-term queries in P2P networks
P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Query-driven indexing for peer-to-peer text retrieval
Proceedings of the 16th international conference on World Wide Web
Query-driven indexing for scalable peer-to-peer text retrieval
Proceedings of the 2nd international conference on Scalable information systems
Efficient peer-to-peer keyword searching
Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
DL meets p2p – distributed document retrieval based on classification and content
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Federated search of text-based digital libraries in hierarchical peer-to-peer networks
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Data allocation scheme based on term weight for P2P information retrieval
Proceedings of the 9th annual ACM international workshop on Web information and data management
Efficient multi-keyword search over p2p web
Proceedings of the 17th international conference on World Wide Web
Exploiting correlated keywords to improve approximate information filtering
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Query-driven indexing for scalable peer-to-peer text retrieval
Future Generation Computer Systems
AlvisP2P: scalable peer-to-peer text retrieval in a structured P2P network
Proceedings of the VLDB Endowment
Adaptive distributed indexing for structured peer-to-peer networks
Proceedings of the 17th ACM conference on Information and knowledge management
Routing of structured queries in large-scale distributed systems
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Efficient query routing by improved peer description in P2P networks
Proceedings of the 3rd international conference on Scalable information systems
Alternatives to conjunctive query processing in peer-to-peer file-sharing systems
Proceedings of the 2009 ACM symposium on Applied Computing
BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
Probably Approximately Correct Search
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
A scalable and effective full-text search in P2P networks
Proceedings of the 18th ACM conference on Information and knowledge management
Scalability of findability: effective and efficient IR operations in large information networks
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
KMV-peer: a robust and adaptive peer-selection algorithm
Proceedings of the fourth ACM international conference on Web search and data mining
FAST: Friends Augmented Search Techniques - System Design & Data-Management Issues
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Improving query correctness using centralized probably approximately correct (PAC) search
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Studying the clustering paradox and scalability of search in highly distributed environments
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations, and (2) the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. These two properties guarantee acceptable storage and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the transmitted posting lists never exceed a constant size. However, as the number of generated term combinations can still become quite large, we also use term statistics extracted from available query logs to index only such combinations that are frequently present in user queries. Thus, by avoiding the generation of superfluous indexing term combinations, we achieve an additional substantial reduction in bandwidth and storage consumption. As a result, the generated distributed index corresponds to a constantly evolving query-driven indexing structure that efficiently follows current information needs of the users. More precisely, our theoretical analysis and experimental results indicate that, at the price of a marginal loss in retrieval quality for rare queries, the generated index size and network traffic remain manageable even for web-size document collections. Furthermore, our experiments show that at the same time the achieved retrieval quality is fully comparable to the one obtained with a state-of-the-art centralized query engine.