Load balancing for term-distributed parallel retrieval

Authors:
Alistair Moffat;William Webber;Justin Zobel
Affiliations:
The University of Melbourne, Victoria, Australia;The University of Melbourne, Victoria, Australia;RMIT University, Victoria, Australia
Venue:
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2006

Citing 15
Cited 25

Partitioned posting files: a parallel inverted file structure for information retrieval

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Parallel text searching in serial files using a processor farm

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Prototyping a distributed information retrieval system that uses statistical ranking

Information Processing and Management: an International Journal
Inverted File Partitioning Schemes in Multiple Disk Systems

IEEE Transactions on Parallel and Distributed Systems
Performance evaluation of a distributed architecture for information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Performance issues in distributed shared-nothing information-retrieval systems

Information Processing and Management: an International Journal
Query performance for tightly coupled distributed digital libraries

Proceedings of the third ACM conference on Digital libraries
Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

ACM Transactions on Information Systems (TOIS)
Building a distributed full-text index for the web

ACM Transactions on Information Systems (TOIS)
Scalable Text Retrieval for Large Digital Libraries

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Hybrid Partition Inverted Files: Experimental Validation

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
Partial Collection Replication for Information Retrieval

Information Retrieval
Parallel Search using Partitioned Inverted Files

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Space-Limited ranked query evaluation using adaptive pruning

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering

A pipelined architecture for distributed text query evaluation

Information Retrieval
Load balancing distributed inverted files

Proceedings of the 9th annual ACM international workshop on Web information and data management
Mining query logs to optimize index partitioning in parallel web search engines

Proceedings of the 2nd international conference on Scalable information systems
Scheduling Intersection Queries in Term Partitioned Inverted Files

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Parallel query processing on distributed clustering indexes

Journal of Discrete Algorithms
On the feasibility of multi-site web search engines

Proceedings of the 18th ACM conference on Information and knowledge management
Improving the load balance for hybrid partitioning scheme by directing hybrid queries

PDCN '08 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks
Performance comparison of clustered and replicated information retrieval systems

ECIR'07 Proceedings of the 29th European conference on IR research
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
Load and storage balanced posting file partitioning for parallel information retrieval

Journal of Systems and Software
A combined semi-pipelined query processing architecture for distributed full-text retrieval

WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Automatic management of partitioned, replicated search services

Proceedings of the 2nd ACM Symposium on Cloud Computing
Replicated partitioning for undirected hypergraphs

Journal of Parallel and Distributed Computing
Towards a distributed search engine

CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
Scalable search platform: improving pipelined query processing for distributed full-text retrieval

Proceedings of the 21st international conference companion on World Wide Web
An investigation into query throughput and load balance using grid IR

FDIA'08 Proceedings of the 2nd BCS IRSG conference on Future Directions in Information Access
Intra-query concurrent pipelined processing for distributed full-text retrieval

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Load Balancing Query Processing in Metric-Space Similarity Search

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Distributed search based on self-indexed compressed text

Information Processing and Management: an International Journal
Capacity planning for vertical search engines: an approach based on coloured petri nets

PETRI NETS'12 Proceedings of the 33rd international conference on Application and Theory of Petri Nets
3D inverted index with cache sharing for web search engines

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Improving the performance of pipelined query processing with skipping

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Maguro, a system for indexing and searching over very large text collections

Proceedings of the sixth ACM international conference on Web search and data mining
Rank-energy selective query forwarding for distributed search systems

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A term-based inverted index partitioning model for efficient distributed query processing

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale web and text retrieval systems deal with amounts of data that greatly exceed the capacity of any single machine. To handle the necessary data volumes and query throughput rates, parallel systems are used, in which the document and index data are split across tightly-clustered distributed computing systems. The index data can be distributed either by document or by term. In this paper we examine methods for load balancing in term-distributed parallel architectures, and propose a suite of techniques for reducing net querying costs. In combination, the techniques we describe allow a 30% improvement in query throughput when tested on an eight-node parallel computer system.