Combination of document priors in web information retrieval

Authors:
Jie Peng;Craig Macdonald;Ben He;Iadh Ounis
Affiliations:
University of Glasgow, Glasgow;University of Glasgow, Glasgow;University of Glasgow, Glasgow;University of Glasgow, Glasgow
Venue:
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Year:
2007

Citing 8
Cited 2

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The Importance of Prior Probabilities for Entry Page Search

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance weighting for query independent evidence

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Multinomial randomness models for retrieval with document fields

ECIR'07 Proceedings of the 29th European conference on IR research

Extending weighting models with a term quality measure

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Content-based relevance estimation on the web using inter-document similarities

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query-independent features (also called document priors), such as the number of incoming links to a document, its Page-Rank, or the type of its associated URL, have been successfully integrated into Web Information Retrieval systems in order to enhance the retrieval effectiveness. The combination of several document priors could further enhance the retrieval performance. However, most current combination of priors approaches are based on heuristics, and often ignore the possible dependence between the document priors. In this paper, we present a novel and robust method for conditionally combining document priors in a principled way. The approach adjusts the distribution of document priors for one source of evidence according to the distribution of document priors for other sources of evidence. We investigate the retrieval performance attainable by our combination of priors method, in comparison to the use of single priors and to a heuristic combination of document priors method, which assumes that document priors are independent. Furthermore, we investigate how sensitive the proposed method is to the training data. Using two standard Web test collections, including the large-scale. GOV2 test collection, we find that some of the document priors used in our experiments, have a considerably high correlation, suggesting that the dependency between documents priors should indeed be taken into account. Through extensive experiments on these two large-scale collections, we observe that our proposed conditional combination method is overall effective and robust.