Quality-biased ranking of web documents

Authors:
Michael Bendersky;W. Bruce Croft;Yanlei Diao
Affiliations:
University of Massachusetts, Amherst, MA, USA;University of Massachusetts, Amherst, MA, USA;University of Massachusetts, Amherst, MA, USA
Venue:
Proceedings of the fourth ACM international conference on Web search and data mining
Year:
2011

Citing 24
Cited 23

Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The Importance of Prior Probabilities for Entry Page Search

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Web Site Design

IEEE Internet Computing
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
PageRank without hyperlinks: structural re-ranking using links induced by language models

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance weighting for query independent evidence

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Document quality models for web ad hoc retrieval

Proceedings of the 14th ACM international conference on Information and knowledge management
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Detecting spam web pages through content analysis

Proceedings of the 15th international conference on World Wide Web
Beyond PageRank: machine learning for static ranking

Proceedings of the 15th international conference on World Wide Web
Linear feature-based models for information retrieval

Information Retrieval
Comparing the effectiveness of hits and salsa

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
BrowseRank: letting web users vote for page importance

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Predicting the readability of short web summaries

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Learning concept importance using a weighted dependence model

Proceedings of the third ACM international conference on Web search and data mining
Personalized news recommendation based on click behavior

Proceedings of the 15th international conference on Intelligent user interfaces
Combination of document priors in web information retrieval

ECIR'07 Proceedings of the 29th European conference on IR research
Utilizing passage-based language models for document retrieval

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Probabilistic document length priors for language models

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

To each his own: personalized content selection based on text comprehensibility

Proceedings of the fifth ACM international conference on Web search and data mining
Predicting Query Performance by Query-Drift Estimation

ACM Transactions on Information Systems (TOIS)
A breakdown of quality flaws in Wikipedia

Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
Score transformation in linear combination for multi-criteria relevance ranking

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Rhetorical relations for information retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Predicting quality flaws in user-generated content: the case of wikipedia

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Back to the roots: a probabilistic framework for query-performance prediction

Proceedings of the 21st ACM international conference on Information and knowledge management
Content-based relevance estimation on the web using inter-document similarities

Proceedings of the 21st ACM international conference on Information and knowledge management
Quality models for microblog retrieval

Proceedings of the 21st ACM international conference on Information and knowledge management
On the usefulness of query features for learning to rank

Proceedings of the 21st ACM international conference on Information and knowledge management
Reordering an index to speed query processing without loss of effectiveness

Proceedings of the Seventeenth Australasian Document Computing Symposium
Efficient and effective retrieval using selective pruning

Proceedings of the sixth ACM international conference on Web search and data mining
Semantic similarity-based PageRank using wordnet

International Journal of Computer Applications in Technology
Ranking Text Documents Based on Conceptual Difficulty Using Term Embedding and Sequential Discourse Cohesion

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Using document-quality measures to predict web-search effectiveness

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Two-Stage learning to rank for information retrieval

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Ranking document clusters using markov random fields

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Shame to be sham: addressing content-based grey hat search engine optimization

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Document features predicting assessor disagreement

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Quality-biased ranking for queries with commercial intent

Proceedings of the 22nd international conference on World Wide Web companion
Incorporating social anchors for ad hoc retrieval

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Relevance in microblogs: enhancing tweet retrieval using hyperlinked documents

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
About learning models with multiple query-dependent features

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many existing retrieval approaches do not take into account the content quality of the retrieved documents, although link-based measures such as PageRank are commonly used as a form of document prior. In this paper, we present the quality-biased ranking method that promotes documents containing high-quality content, and penalizes low-quality documents. The quality of the document content can be determined by its readability, layout and ease-of-navigation, among other factors. Accordingly, instead of using a single estimate for document quality, we consider multiple content-based features that are directly integrated into a state-of- the-art retrieval method. These content-based features are easy to compute, store and retrieve, even for large web collections. We use several query sets and web collections to empirically evaluate the performance of our quality-biased retrieval method. In each case, our method consistently improves by a large margin the retrieval performance of text-based and link-based retrieval methods that do not take into account the quality of the document content.