Independence of contributing retrieval strategies in data fusion for effective information retrieval

Authors:
Alan F. Smeaton
Affiliations:
School of Computer Application, Dublin City University, Dublin 9, Ireland
Venue:
IRSG'98 Proceedings of the 20th Annual BCS-IRSG conference on Information Retrieval Research
Year:
1998

Citing 5
Cited 5

Combining the evidence of multiple query representations for information retrieval

TREC-2 Proceedings of the second conference on Text retrieval conference
Analyses of multiple evidence combination

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Robust Distributed Computing and Sensing Algorithm

Computer
Knowledge-Based and Statistical Approaches to Text Retrieval

IEEE Expert: Intelligent Systems and Their Applications
Relevance Feedback and Query Expansion for Searching the Web: A Model for Searching a Digital Library

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries

Combining and selecting characteristics of information use

Journal of the American Society for Information Science and Technology
A Formal Model for Data Fusion

FQAS '02 Proceedings of the 5th International Conference on Flexible Query Answering Systems
A survey on the use of relevance feedback for information access systems

The Knowledge Engineering Review
Improving high accuracy retrieval by eliminating the uneven correlation effect in data fusion

Journal of the American Society for Information Science and Technology
A comparison of score, rank and probability-based fusion methods for video shot retrieval

CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In information retrieval, data fusion is a technique for combining the outputs of more than one retrieval strategy which rank documents for retrieval. One of the observations often made about data fusion in IR is that the fusion together of document rankings from different implementations can yield a level of effectiveness which is better than either of the individual input strategies. This phenomenon has been repeatedly shown in TREC and elsewhere in IR research and it has been found in general that this holds true when the implementations are based on conceptually different approaches. In this paper we explore this hypothesis using a text retrieval application on over 200 Mbytes of Spanish newspaper texts with a fixed set of queries for which relevant documents are known. Using 9 different retrieval strategies used by different groups in TREC-4, we fuse together document rankings in different combinations in an attempt to see whether there is a correlation between the perceived conceptual independence of a document ranking strategy, and the observed improvement or otherwise in retrieval effectiveness from data fusion. Although the application we use for our experiments is text retrieval on Spanish texts, the principles we explore hold true for engineering any kind of information system based on combining the ranked retrieval of objects.