Independence of contributing retrieval strategies in data fusion for effective information retrieval

  • Authors:
  • Alan F. Smeaton

  • Affiliations:
  • School of Computer Application, Dublin City University, Dublin 9, Ireland

  • Venue:
  • IRSG'98 Proceedings of the 20th Annual BCS-IRSG conference on Information Retrieval Research
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

In information retrieval, data fusion is a technique for combining the outputs of more than one retrieval strategy which rank documents for retrieval. One of the observations often made about data fusion in IR is that the fusion together of document rankings from different implementations can yield a level of effectiveness which is better than either of the individual input strategies. This phenomenon has been repeatedly shown in TREC and elsewhere in IR research and it has been found in general that this holds true when the implementations are based on conceptually different approaches. In this paper we explore this hypothesis using a text retrieval application on over 200 Mbytes of Spanish newspaper texts with a fixed set of queries for which relevant documents are known. Using 9 different retrieval strategies used by different groups in TREC-4, we fuse together document rankings in different combinations in an attempt to see whether there is a correlation between the perceived conceptual independence of a document ranking strategy, and the observed improvement or otherwise in retrieval effectiveness from data fusion. Although the application we use for our experiments is text retrieval on Spanish texts, the principles we explore hold true for engineering any kind of information system based on combining the ranked retrieval of objects.