Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
On the reuse of past optimal queries
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Effective retrieval with distributed collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Methods for information server selection
ACM Transactions on Information Systems (TOIS)
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Cluster-based language models for distributed retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Analysis of a very large web search engine query log
ACM SIGIR Forum
Server selection on the World Wide Web
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Rank-preserving two-level caching for scalable search engines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Predictive caching and prefetching of query results in search engines
WWW '03 Proceedings of the 12th international conference on World Wide Web
Parallel Search using Partitioned Inverted Files
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
A survey of Web cache replacement strategies
ACM Computing Surveys (CSUR)
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Hourly analysis of a very large topically categorized web query log
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A case study of distributed information retrieval architectures to index one terabyte of text
Information Processing and Management: an International Journal
ACM Transactions on Information Systems (TOIS)
How are we searching the world wide web?: a comparison of nine search engine transaction logs
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Query-driven document partitioning and collection selection
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
On caching search engine query results
Computer Communications
Collection selection: ...now, with more documents!
Proceedings of the 3rd international conference on Scalable information systems
A Study of the Impact of Index Updates on Distributed Query Processing for Web Search
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
A Last-Resort Semantic Cache for Web Queries
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Location cache for web queries
Proceedings of the 18th ACM conference on Information and knowledge management
ACM Transactions on Information Systems (TOIS)
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
New caching techniques for web search engines
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Document allocation policies for selective searching of distributed indexes
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Learning to distribute queries into web search nodes
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Hi-index | 0.00 |
To address the rapid growth of the Internet, modern Web search engines have to adopt distributed organizations, where the collection of indexed documents is partitioned among several servers, and query answering is performed as a parallel and distributed task. Collection selection can be a way to reduce the overall computing load, by finding a trade-off between the quality of results retrieved and the cost of solving queries. In this paper, we analyze the relationship between the collection selection strategy, the effect on load balancing and on the caching subsystem, by exploring the design-space of a distributed search engine based on collection selection. In particular, we propose a strategy to perform collection selection in a load-driven way, and a novel caching policy able to incrementally refine the effectiveness of the results returned for each subsequent cache hit. The combination of load-driven collection selection and incremental caching strategies allows our system to retrieve two thirds of the top-ranked results returned by a baseline centralized index, with only one fifth of the computing workload.