Fusion Via a Linear Combination of Scores

  • Authors:
  • Christopher C. Vogt;Garrison W. Cottrell

  • Affiliations:
  • Computer Science and Engineering, University of California San Diego, La Jolla CA 92093-0114, USA. vogt@cs.ucsd.edu;Computer Science and Engineering, University of California San Diego, La Jolla CA 92093-0114, USA. gary@cs.ucsd.edu

  • Venue:
  • Information Retrieval
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a thorough analysis of the capabilities of the linearcombination (LC) model for fusion of information retrieval systems.The LC model combines the results lists of multiple IR systems byscoring each document using a weighted sum of the scores from each ofthe component systems. We first present both empirical and analyticaljustification for the hypotheses that such a model should only be usedwhen the systems involved have high performance, a large overlap ofrelevant documents, and a small overlap of nonrelevant documents. Theempirical approach allows us to very accurately predict theperformance of a combined system. We also derive a formula for atheoretically optimal weighting scheme for combining 2 systems. Weintroduce d—the difference between the average score on relevantdocuments and the average score on nonrelevant documents—as aperformance measure which not only allows mathematical reasoning aboutsystem performance, but also allows the selection of weights whichgeneralize well to new documents. We describe a number of experimentsinvolving large numbers of different IR systems which support thesefindings.