Combination of document priors in web information retrieval

  • Authors:
  • Jie Peng;Craig Macdonald;Ben He;Iadh Ounis

  • Affiliations:
  • University of Glasgow, Glasgow;University of Glasgow, Glasgow;University of Glasgow, Glasgow;University of Glasgow, Glasgow

  • Venue:
  • Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Query-independent features (also called document priors), such as the number of incoming links to a document, its Page-Rank, or the type of its associated URL, have been successfully integrated into Web Information Retrieval systems in order to enhance the retrieval effectiveness. The combination of several document priors could further enhance the retrieval performance. However, most current combination of priors approaches are based on heuristics, and often ignore the possible dependence between the document priors. In this paper, we present a novel and robust method for conditionally combining document priors in a principled way. The approach adjusts the distribution of document priors for one source of evidence according to the distribution of document priors for other sources of evidence. We investigate the retrieval performance attainable by our combination of priors method, in comparison to the use of single priors and to a heuristic combination of document priors method, which assumes that document priors are independent. Furthermore, we investigate how sensitive the proposed method is to the training data. Using two standard Web test collections, including the large-scale. GOV2 test collection, we find that some of the document priors used in our experiments, have a considerably high correlation, suggesting that the dependency between documents priors should indeed be taken into account. Through extensive experiments on these two large-scale collections, we observe that our proposed conditional combination method is overall effective and robust.