Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Collection selection and results merging with topically organized U.S. patents and TREC data
Proceedings of the ninth international conference on Information and knowledge management
Information Retrieval
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
Multi-Tier Architecture for Web Search Engines
LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Combining the language model and inference network approaches to retrieval
Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query-driven document partitioning and collection selection
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Load-balancing and caching for collection selection architectures
Proceedings of the 2nd international conference on Scalable information systems
Efficiency trade-offs in two-tier web search systems
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Sources of evidence for vertical selection
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
SUSHI: scoring scaled samples for server selection
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Classification-based resource selection
Proceedings of the 18th ACM conference on Information and knowledge management
Indexing strategies for graceful degradation of search quality
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Mixture model with multiple centralized retrieval algorithms for result merging in federated search
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Allocating images and selecting image collections for distributed visual search
Proceedings of the 4th International Conference on Internet Multimedia Computing and Service
Shard ranking and cutoff estimation for topically partitioned collections
Proceedings of the 21st ACM international conference on Information and knowledge management
Distributed information retrieval and applications
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Taily: shard selection using the tail of score distributions
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Search result diversification in resource selection for federated search
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Rank-energy selective query forwarding for distributed search systems
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.01 |
Indexes for large collections are often divided into shards that are distributed across multiple computers and searched in parallel to provide rapid interactive search. Typically, all index shards are searched for each query. For organizations with modest computational resources the high query processing cost incurred in this exhaustive search setup can be a deterrent to working with large collections. This paper investigates document allocation policies that permit searching only a few shards for each query (selective search) without sacrificing search accuracy. Random, source-based and topic-based document-to-shard allocation policies are studied in the context of selective search. A thorough study of the tradeoff between search cost and search accuracy in a sharded index environment is performed using three large TREC collections. The experimental results demonstrate that selective search using topic-based shards cuts the search cost to less than 1/5th of that of the exhaustive search without reducing search accuracy across all the three datasets. Stability analysis shows that 90% of the queries do as well or improve with selective search. An overlap-based evaluation with an additional 1000 queries for each dataset tests and confirms the conclusions drawn using the smaller TREC query sets.