Hybrid query scheduling for a replicated search engine

Authors:
Ana Freire;Craig Macdonald;Nicola Tonellotto;Iadh Ounis;Fidel Cacheda
Affiliations:
University of A Coruña, A Coruña, Spain;University of Glasgow, Glasgow, UK;National Research Council of Italy, Pisa, Italy;University of Glasgow, Glasgow, UK;University of A Coruña, A Coruña, Spain
Venue:
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Year:
2013

Citing 13
Cited 1

Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Content-based retrieval in hybrid peer-to-peer networks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Efficient query evaluation using a two-level retrieval process

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A pipelined architecture for distributed text query evaluation

Information Retrieval
Dynamic Provisioning of Resources in Data Centers

ICAS '07 Proceedings of the Third International Conference on Autonomic and Autonomous Systems
Performance analysis of distributed information retrieval architectures using an improved network simulation model

Information Processing and Management: an International Journal
Challenges in building large-scale information retrieval systems: invited talk

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Proceedings of the 2009 workshop on Web Search Click Data

Second ACM International Conference on Web Search and Web Data Mining
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
Bagging gradient-boosted trees for high precision, low variance ranking models

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Query efficiency prediction for dynamic pruning

Proceedings of the 9th workshop on Large-scale and distributed informational retrieval
Learning to predict response times for online query scheduling

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Scheduling queries across replicas

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

A self-adapting latency/power tradeoff model for replicated search engines

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Search engines use replication and distribution of large indices across many query servers to achieve efficient retrieval. Under high query load, queries can be scheduled to replicas that are expected to be idle soonest, facilitated by the use of predicted query response times. However, the overhead of making response time predictions can hinder the usefulness of query scheduling under low query load. In this paper, we propose a hybrid scheduling approach that combines the scheduling methods appropriate for both low and high load conditions, and can adapt in response to changing conditions. We deploy a simulation framework, which is prepared with actual and predicted response times for real Web search queries for one full day. Our experiments using different numbers of shards and replicas of the 50 million document ClueWeb09 corpus show that hybrid scheduling can reduce the average waiting times of one day of queries by 68% under high load conditions and by 7% under low load conditions w.r.t. traditional scheduling methods.