Document allocation policies for selective searching of distributed indexes

Authors:
Anagha Kulkarni;Jamie Callan
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 18
Cited 8

Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Collection selection and results merging with topically organized U.S. patents and TREC data

Proceedings of the ninth international conference on Information and knowledge management
Information Retrieval

Information Retrieval
Information Retrieval: Computational and Theoretical Aspects

Information Retrieval: Computational and Theoretical Aspects
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
Relevant document distribution estimation method for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Multi-Tier Architecture for Web Search Engines

LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Combining the language model and inference network approaches to retrieval

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query-driven document partitioning and collection selection

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Load-balancing and caching for collection selection architectures

Proceedings of the 2nd international conference on Scalable information systems
Efficiency trade-offs in two-tier web search systems

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Sources of evidence for vertical selection

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
SUSHI: scoring scaled samples for server selection

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Classification-based resource selection

Proceedings of the 18th ACM conference on Information and knowledge management

Indexing strategies for graceful degradation of search quality

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Mixture model with multiple centralized retrieval algorithms for result merging in federated search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Allocating images and selecting image collections for distributed visual search

Proceedings of the 4th International Conference on Internet Multimedia Computing and Service
Shard ranking and cutoff estimation for topically partitioned collections

Proceedings of the 21st ACM international conference on Information and knowledge management
Distributed information retrieval and applications

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Taily: shard selection using the tail of score distributions

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Search result diversification in resource selection for federated search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Rank-energy selective query forwarding for distributed search systems

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.01

Visualization

Abstract

Indexes for large collections are often divided into shards that are distributed across multiple computers and searched in parallel to provide rapid interactive search. Typically, all index shards are searched for each query. For organizations with modest computational resources the high query processing cost incurred in this exhaustive search setup can be a deterrent to working with large collections. This paper investigates document allocation policies that permit searching only a few shards for each query (selective search) without sacrificing search accuracy. Random, source-based and topic-based document-to-shard allocation policies are studied in the context of selective search. A thorough study of the tradeoff between search cost and search accuracy in a sharded index environment is performed using three large TREC collections. The experimental results demonstrate that selective search using topic-based shards cuts the search cost to less than 1/5th of that of the exhaustive search without reducing search accuracy across all the three datasets. Stability analysis shows that 90% of the queries do as well or improve with selective search. An overlap-based evaluation with an additional 1000 queries for each dataset tests and confirms the conclusions drawn using the smaller TREC query sets.