SUSHI: scoring scaled samples for server selection

Authors:
Paul Thomas;Milad Shokouhi
Affiliations:
CSIRO, Canberra, Australia;Microsoft Research, Cambridge, United Kingdom
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 21
Cited 15

Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A technique for measuring the relative size and overlap of public Web search engines

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Automatic discovery of language models for text databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Cluster-based language models for distributed retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic approach to database selection in networked IR

ACM Transactions on Information Systems (TOIS)
Discovering the representative of a search engine

Proceedings of the tenth international conference on Information and knowledge management
On Collection Size and Retrieval Effectiveness

Information Retrieval
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Relevant document distribution estimation method for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A Methodology for Collection Selection in Heterogeneous Contexts

ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
A semisupervised learning method to merge search engine results

ACM Transactions on Information Systems (TOIS)
Engineering a multi-purpose test collection for web retrieval experiments

Information Processing and Management: an International Journal
Unified utility maximization framework for resource selection

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Server selection methods in hybrid portal search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling search engine effectiveness for federated search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Accurately interpreting clickthrough data as implicit feedback

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
The FedLemur project: Federated search in the real world

Journal of the American Society for Information Science and Technology
Capturing collection size for distributed non-cooperative retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A formal approach to score normalization for meta-search

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Robust result merging using sample-based score estimates

ACM Transactions on Information Systems (TOIS)
Central-rank-based collection selection in uncooperative distributed information retrieval

ECIR'07 Proceedings of the 29th European conference on IR research

Classification-based resource selection

Proceedings of the 18th ACM conference on Information and knowledge management
Collection-integral source selection for uncooperative distributed information retrieval environments

Information Sciences: an International Journal
A joint probabilistic classification model for resource selection

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Vertical selection in the presence of unlabeled verticals

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Document allocation policies for selective searching of distributed indexes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Federated Search

Foundations and Trends in Information Retrieval
Integrating explicit semantic analysis for ontology-based resource selection

Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Evaluating server selection for federated search

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
To what problem is distributed information retrieval the solution?

Journal of the American Society for Information Science and Technology
Shard ranking and cutoff estimation for topically partitioned collections

Proceedings of the 21st ACM international conference on Information and knowledge management
Collection ranking and selection for federated entity search

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Reducing the uncertainty in resource selection

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Distributed information retrieval and applications

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Taily: shard selection using the tail of score distributions

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Search result diversification in resource selection for federated search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern techniques for distributed information retrieval use a set of documents sampled from each server, but these samples have been underutilised in server selection. We describe a new server selection algorithm, SUSHI, which unlike earlier algorithms can make full use of the text of each sampled document and which does not need training data. SUSHI can directly optimise for many common cases, including high precision retrieval, and by including a simple stopping condition can do so while reducing network traffic. Our experiments compare SUSHI with alternatives and show it achieves the same effectiveness as the best current methods while being substantially more efficient, selecting as few as 20% as many servers.