Performance comparison of clustered and replicated information retrieval systems

Authors:
Fidel Cacheda;Victor Carneiro;Vassilis Plachouras;Iadh Ounis
Affiliations:
Department of Information and Communication Technologies, University of A Coruña, Facultad de Informática, Coruña, Spain;Department of Information and Communication Technologies, University of A Coruña, Facultad de Informática, Coruña, Spain;Yahoo! Research, Barcelona, Spain;Department of Computing Science, University of Glasgow, Glasgow, UK
Venue:
ECIR'07 Proceedings of the 29th European conference on IR research
Year:
2007

Citing 18
Cited 2

On the allocation of documents in multiprocessor information retrieval systems

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Parallelizing I/O intensive applications for a workstation cluster: a case study

ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Inverted File Partitioning Schemes in Multiple Disk Systems

IEEE Transactions on Parallel and Distributed Systems
Performance evaluation of a distributed architecture for information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Query performance for tightly coupled distributed digital libraries

Proceedings of the third ACM conference on Digital libraries
Methods for information server selection

ACM Transactions on Information Systems (TOIS)
Partial collection replication versus caching for information retrieval systems

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Performance of inverted indices in shared-nothing distributed text document informatioon retrieval systems

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Spatial information retrieval and geographical ontologies an overview of the SPIRIT project

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
From E-Sex to E-Commerce: Web Search Changes

Computer
Scalable Text Retrieval for Large Digital Libraries

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
Experiencies Retrieving Information in the World Wide Web

ISCC '01 Proceedings of the Sixth IEEE Symposium on Computers and Communications
Hourly analysis of a very large topically categorized web query log

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A case study of distributed information retrieval architectures to index one terabyte of text

Information Processing and Management: an International Journal
Load balancing for term-distributed parallel retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A pipelined architecture for distributed text query evaluation

Information Retrieval
Network analysis for distributed information retrieval architectures

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Scheduling queries across replicas

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Document replication strategies for geographically distributed web search engines

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

The amount of information available over the Internet is increasing daily as well as the importance and magnitude of Web search engines. Systems based on a single centralised index present several problems (such as lack of scalability), which lead to the use of distributed information retrieval systems to effectively search for and locate the required information. A distributed retrieval system can be clustered and/or replicated. In this paper, using simulations, we present a detailed performance analysis, both in terms of throughput and response time, of a clustered system compared to a replicated system. In addition, we consider the effect of changes in the query topics over time. We show that the performance obtained for a clustered system does not improve the performance obtained by the best replicated system. Indeed, the main advantage of a clustered system is the reduction of network traffic. However, the use of a switched network eliminates the bottleneck in the network, markedly improving the performance of the replicated systems. Moreover, we illustrate the negative performance effect of the changes over time in the query topics when a distributed clustered system is used. On the contrary, the performance of a distributed replicated system is query independent.