Fusing different information retrieval systems according to query-topics: a study based on correlation in information retrieval systems and TREC topics

Authors:
Anthony Bigot;Claude Chrisment;Taoufiq Dkaki;Gilles Hubert;Josiane Mothe
Affiliations:
Institut de Recherche en Informatique de Toulouse, UMR 5505, CNRS, Université de Toulouse, Toulouse Cedex 04, France 31062;Institut de Recherche en Informatique de Toulouse, UMR 5505, CNRS, Université de Toulouse, Toulouse Cedex 04, France 31062;Institut de Recherche en Informatique de Toulouse, UMR 5505, CNRS, Université de Toulouse, Toulouse Cedex 04, France 31062;Institut de Recherche en Informatique de Toulouse, UMR 5505, CNRS, Université de Toulouse, Toulouse Cedex 04, France 31062;Institut de Recherche en Informatique de Toulouse, UMR 5505, CNRS, Université de Toulouse, Toulouse Cedex 04, France 31062
Venue:
Information Retrieval
Year:
2011

Citing 33
Cited 1

Using statistical testing in the evaluation of retrieval experiments

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Analyses of multiple evidence combination

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Blind Men and Elephants: Six Approaches to TREC data

Information Retrieval
Predicting the effectiveness of Naïve data fusion on the basis of system characteristics

Journal of the American Society for Information Science
Models for metasearch

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Query clustering using user logs

ACM Transactions on Information Systems (TOIS)
Predicting query performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Disproving the fusion hypothesis: an analysis of data fusion via effective information retrieval strategies

Proceedings of the 2003 ACM symposium on Applied computing
The NRRC reliable information access (RIA) workshop

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Learning to Rank

Information Retrieval
Exploiting Query Repetition and Regularity in an Adaptive Community-Based Web Search Engine

User Modeling and User-Adapted Interaction
A decision theoretic approach to combining information filters: An analytical and empirical evaluation

Journal of the American Society for Information Science and Technology
What makes a query difficult?

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Using score distributions for query-time fusion in multimediaretrieval

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Improving high accuracy retrieval by eliminating the uneven correlation effect in data fusion

Journal of the American Society for Information Science and Technology
Learning to rank: from pairwise approach to listwise approach

Proceedings of the 24th international conference on Machine learning
Information re-retrieval: repeat queries in Yahoo's logs

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Hits hits TREC: exploring IR evaluation results with network analysis

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of statistical significance tests for information retrieval evaluation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Linguistic Analysis of Users' Queries: Towards an Adaptive Information Retrieval System

SITIS '07 Proceedings of the 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System
Re-ranking search results using language models of query-specific clusters

Information Retrieval
What queries are likely to recur in web search?

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
An adaptable search engine for multimodal information retrieval

Journal of the American Society for Information Science and Technology
Overview of the Reliable Information Access Workshop

Information Retrieval
Report on the SIGIR 2009 workshop on the future of IR evaluation

ACM SIGIR Forum
Automatic Cluster Selection Using Index Driven Search Strategy

AI*IA '09: Proceedings of the XIth International Conference of the Italian Association for Artificial Intelligence Reggio Emilia on Emergent Perspectives in Artificial Intelligence
Large scale query log analysis of re-finding

Proceedings of the third ACM international conference on Web search and data mining
Examining repetition in user search behavior

ECIR'07 Proceedings of the 29th European conference on IR research
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data

Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
Introduction to special issue on learning to rank for information retrieval

Information Retrieval
How many performance measures to evaluate information retrieval systems?

Knowledge and Information Systems

The weighted Condorcet fusion in information retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

To evaluate Information Retrieval Systems on their effectiveness, evaluation programs such as TREC offer a rigorous methodology as well as benchmark collections. Whatever the evaluation collection used, effectiveness is generally considered globally, averaging the results over a set of information needs. As a result, the variability of system performance is hidden as the similarities and differences from one system to another are averaged. Moreover, the topics on which a given system succeeds or fails are left unknown. In this paper we propose an approach based on data analysis methods (correspondence analysis and clustering) to discover correlations between systems and to find trends in topic/system correlations. We show that it is possible to cluster topics and systems according to system performance on these topics, some system clusters being better on some topics. Finally, we propose a new method to consider complementary systems as based on their performances which can be applied for example in the case of repeated queries. We consider the system profile based on the similarity of the set of TREC topics on which systems achieve similar levels of performance. We show that this method is effective when using the TREC ad hoc collection.