Load balancing distributed inverted files

Authors:
Mauricio Marin;Carlos Gomez
Affiliations:
Yahoo! Research, Santiago, Chile;Yahoo! Research, Santiago, Chile
Venue:
Proceedings of the 9th annual ACM international workshop on Web information and data management
Year:
2007

Citing 7
Cited 3

Tight bounds for minimax grid matching, with applications to the average case analysis of algorithms

STOC '86 Proceedings of the eighteenth annual ACM symposium on Theory of computing
A bridging model for parallel computation

Communications of the ACM
Approximation algorithms for bin packing: a survey

Approximation algorithms for NP-hard problems
A Probabilistic Analysis of the LPT Scheduling Rule

Performance '84 Proceedings of the Tenth International Symposium on Computer Performance Modelling, Measurement and Evaluation
Some unexpected expected behavior results for bin packing

STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Load balancing for term-distributed parallel retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Scheduling Intersection Queries in Term Partitioned Inverted Files

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Information retrieval from digital libraries in SQL

Proceedings of the 10th ACM workshop on Web information and data management
A combined semi-pipelined query processing architecture for distributed full-text retrieval

WISE'10 Proceedings of the 11th international conference on Web information systems engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper present a comparison of scheduling algorithms applied to the context of load balancing the query traffic on distributed inverted files. We implemented a number of algorithms taken from the literature. We propose a novel method to formulate the cost of query processing so that these algorithms can be used to schedule queries onto processors. We avoid measuring load balance at the search engine side because this can lead to imprecise evaluation. Our method is based on the simulation of a bulk-synchronous parallel computer at the broker machine side. This simulation determines an optimal way of processing the queries and provides a stable baseline upon which both the broker and search engine can tune their operation in accordance with the observed query traffic. We conclude that the simplest load balancing heuristics are good enough to achieve efficient performance. Our method can be used in practice by broker machines to schedule queries efficiently onto the cluster processors of search engines.