Quality-biased ranking of web documents

  • Authors:
  • Michael Bendersky;W. Bruce Croft;Yanlei Diao

  • Affiliations:
  • University of Massachusetts, Amherst, MA, USA;University of Massachusetts, Amherst, MA, USA;University of Massachusetts, Amherst, MA, USA

  • Venue:
  • Proceedings of the fourth ACM international conference on Web search and data mining
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Many existing retrieval approaches do not take into account the content quality of the retrieved documents, although link-based measures such as PageRank are commonly used as a form of document prior. In this paper, we present the quality-biased ranking method that promotes documents containing high-quality content, and penalizes low-quality documents. The quality of the document content can be determined by its readability, layout and ease-of-navigation, among other factors. Accordingly, instead of using a single estimate for document quality, we consider multiple content-based features that are directly integrated into a state-of- the-art retrieval method. These content-based features are easy to compute, store and retrieve, even for large web collections. We use several query sets and web collections to empirically evaluate the performance of our quality-biased retrieval method. In each case, our method consistently improves by a large margin the retrieval performance of text-based and link-based retrieval methods that do not take into account the quality of the document content.