Efficiency trade-offs in two-tier web search systems

Authors:
Ricardo Baeza-Yates;Vanessa Murdock;Claudia Hauff
Affiliations:
Yahoo!, Barcelona, Spain;Yahoo!, Barcelona, Spain;University of Twente, Enschede, Netherlands
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 21
Cited 11

Viewing morphology as an inference process

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Overview of the second text retrieval conference (TREC-2)

TREC-2 Proceedings of the second conference on Text retrieval conference
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Server selection on the World Wide Web

DL '00 Proceedings of the fifth ACM conference on Digital libraries
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Predicting query performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Does WT10g look like the web?

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling framework for resource selection and results merging

Proceedings of the eleventh international conference on Information and knowledge management
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Server Ranking for Distributed Text Retrieval Systems on the Internet

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Multi-Tier Architecture for Web Search Engines

LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Collection selection for managed distributed document databases

Information Processing and Management: an International Journal
Performance and cost tradeoffs in Web search

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
The impact of caching on search engines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Pruning policies for two-tiered inverted index with correctness guarantee

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Data cleansing for Web information retrieval using query independent features

Journal of the American Society for Information Science and Technology
ResIn: a combination of results caching and index pruning for high-performance web search engines

Query forwarding in geographically distributed search engines

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Document allocation policies for selective searching of distributed indexes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A comparison of user and system query performance predictions

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Document assignment in multi-site search engines

Proceedings of the fourth ACM international conference on Web search and data mining
Towards a distributed search engine

CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
Shard ranking and cutoff estimation for topically partitioned collections

Proceedings of the 21st ACM international conference on Information and knowledge management
Document selection for tiered indexing in commerce search

Proceedings of the sixth ACM international conference on Web search and data mining
Distributed information retrieval and applications

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Taily: shard selection using the tail of score distributions

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Slow Search: Information Retrieval without Time Constraints

Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
Improving the efficiency of multi-site web search engines

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

Search engines rely on searching multiple partitioned corpora to return results to users in a reasonable amount of time. In this paper we analyze the standard two-tier architecture for Web search with the difference that the corpus to be searched for a given query is predicted in advance. We show that any predictor better than random yields time savings, but this decrease in the processing time yields an increase in the infrastructure cost. We provide an analysis and investigate this trade-off in the context of two different scenarios on real-world data. We demonstrate that in general the decrease in answer time is justified by a small increase in infrastructure cost.