Content-based relevance estimation on the web using inter-document similarities

Authors:
Fiana Raiber;Oren Kurland;Moshe Tennenholtz
Affiliations:
Technion, Haifa, Israel;Technion, Haifa, Israel;Microsoft Research and Technion, Haifa, Israel
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 10
Cited 0

Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Regularizing query-based retrieval scores

Information Retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
PageRank without hyperlinks: Structural reranking using links induced by language models

ACM Transactions on Information Systems (TOIS)
Combination of document priors in web information retrieval

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Quality-biased ranking of web documents

Proceedings of the fourth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In adversarial and noisy search settings as the Web, the document-query surface level similarity can be a highly misleading relevance signal. Thus, devising content-based relevance estimation (ranking) approaches becomes highly challenging. We address this challenge using two methods that utilize inter-document similarities in an initially retrieved list. The first removes documents from the list that exhibit high query similarity, but for which there is insufficient additional support for relevance that is based on inter-document similarities. The method is based on a probabilistic model that decouples document-query similarities from relevance estimation. The second method re-ranks the list by "rewarding" documents that exhibit high similarity both to the query and to other documents in the list. Both methods incorporate, in addition, at the model level, query-independent document quality estimates. Extensive empirical evaluation demonstrates the merits of our methods.