Performance comparison of clustered and replicated information retrieval systems

  • Authors:
  • Fidel Cacheda;Victor Carneiro;Vassilis Plachouras;Iadh Ounis

  • Affiliations:
  • Department of Information and Communication Technologies, University of A Coruña, Facultad de Informática, Coruña, Spain;Department of Information and Communication Technologies, University of A Coruña, Facultad de Informática, Coruña, Spain;Yahoo! Research, Barcelona, Spain;Department of Computing Science, University of Glasgow, Glasgow, UK

  • Venue:
  • ECIR'07 Proceedings of the 29th European conference on IR research
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

The amount of information available over the Internet is increasing daily as well as the importance and magnitude of Web search engines. Systems based on a single centralised index present several problems (such as lack of scalability), which lead to the use of distributed information retrieval systems to effectively search for and locate the required information. A distributed retrieval system can be clustered and/or replicated. In this paper, using simulations, we present a detailed performance analysis, both in terms of throughput and response time, of a clustered system compared to a replicated system. In addition, we consider the effect of changes in the query topics over time. We show that the performance obtained for a clustered system does not improve the performance obtained by the best replicated system. Indeed, the main advantage of a clustered system is the reduction of network traffic. However, the use of a switched network eliminates the bottleneck in the network, markedly improving the performance of the replicated systems. Moreover, we illustrate the negative performance effect of the changes over time in the query topics when a distributed clustered system is used. On the contrary, the performance of a distributed replicated system is query independent.