IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Efficient top-K query calculation in distributed networks
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
User modeling for full-text federated search in peer-to-peer networks
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
On synopses for distinct-value estimation under multiset operations
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Full-text federated search in peer-to-peer networks
ACM SIGIR Forum
Web text retrieval with a P2P query-driven index
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting correlated keywords to improve approximate information filtering
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Query-driven indexing for scalable peer-to-peer text retrieval
Future Generation Computer Systems
Query Workload Driven Summarization for P2P Query Routing
P2P '08 Proceedings of the 2008 Eighth International Conference on Peer-to-Peer Computing
Efficient query routing by improved peer description in P2P networks
Proceedings of the 3rd international conference on Scalable information systems
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
On the feasibility of multi-site web search engines
Proceedings of the 18th ACM conference on Information and knowledge management
A scalable and effective full-text search in P2P networks
Proceedings of the 18th ACM conference on Information and knowledge management
Central-rank-based collection selection in uncooperative distributed information retrieval
ECIR'07 Proceedings of the 29th European conference on IR research
A peer-selection algorithm for information retrieval
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
The problem of fully decentralized search over many collections is considered. The objective is to approximate the results of centralized search (namely, using a central index) while controlling the communication cost and involving only a small number of collections. The proposed solution is couched in a peer-to-peer (P2P) network, but can also be applied in other setups. Peers publish per-term summaries of their collections. Specifically, for each term, the range of document scores is divided into intervals; and for each interval, a KMV (K Minimal Values) synopsis of its documents is created. A new peer-selection algorithm uses the KMV synopses and two scoring functions in order to adaptively rank the peers, according to the relevance of their documents to a given query. The proposed method achieves high-quality results while meeting the above criteria of efficiency. In particular, experiments are done on two large, real-world datasets; one is blogs and the other is web data. These experiments show that the algorithm outperforms the state-of-the-art approaches and is robust over different collections, various scoring functions and multi-term queries.