Fusing different information retrieval systems according to query-topics: a study based on correlation in information retrieval systems and TREC topics

  • Authors:
  • Anthony Bigot;Claude Chrisment;Taoufiq Dkaki;Gilles Hubert;Josiane Mothe

  • Affiliations:
  • Institut de Recherche en Informatique de Toulouse, UMR 5505, CNRS, Université de Toulouse, Toulouse Cedex 04, France 31062;Institut de Recherche en Informatique de Toulouse, UMR 5505, CNRS, Université de Toulouse, Toulouse Cedex 04, France 31062;Institut de Recherche en Informatique de Toulouse, UMR 5505, CNRS, Université de Toulouse, Toulouse Cedex 04, France 31062;Institut de Recherche en Informatique de Toulouse, UMR 5505, CNRS, Université de Toulouse, Toulouse Cedex 04, France 31062;Institut de Recherche en Informatique de Toulouse, UMR 5505, CNRS, Université de Toulouse, Toulouse Cedex 04, France 31062

  • Venue:
  • Information Retrieval
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

To evaluate Information Retrieval Systems on their effectiveness, evaluation programs such as TREC offer a rigorous methodology as well as benchmark collections. Whatever the evaluation collection used, effectiveness is generally considered globally, averaging the results over a set of information needs. As a result, the variability of system performance is hidden as the similarities and differences from one system to another are averaged. Moreover, the topics on which a given system succeeds or fails are left unknown. In this paper we propose an approach based on data analysis methods (correspondence analysis and clustering) to discover correlations between systems and to find trends in topic/system correlations. We show that it is possible to cluster topics and systems according to system performance on these topics, some system clusters being better on some topics. Finally, we propose a new method to consider complementary systems as based on their performances which can be applied for example in the case of repeated queries. We consider the system profile based on the similarity of the set of TREC topics on which systems achieve similar levels of performance. We show that this method is effective when using the TREC ad hoc collection.