Pseudo test collections for learning web search ranking functions

Authors:
Nima Asadi;Donald Metzler;Tamer Elsayed;Jimmy Lin
Affiliations:
University of Maryland, College Park, MD, USA;University of Southern California, Marina del Rey, CA, USA;King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia;University of Maryland, College Park, MD, USA
Venue:
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Year:
2011

Citing 34
Cited 4

Inferring probability of relevance using the method of logistic regression

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
The stochastic approach for link-structure analysis (SALSA) and the TKC effect

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Ranking retrieval systems without relevance judgments

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Discriminative models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Accurately interpreting clickthrough data as implicit feedback

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Finding advertising keywords on web pages

Proceedings of the 15th international conference on World Wide Web
Learning user interaction models for predicting web search result preferences

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Reliable information retrieval evaluation with incomplete and biased judgements

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
AdaRank: a boosting algorithm for information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A semantic approach to contextual advertising

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic feature selection in the markov random field model for information retrieval

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Semiautomatic evaluation of retrieval systems using document similarities

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Finding keyword from online broadcasting content for targeted advertising

Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising
Learning to rank with partially-labeled data

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Exploring mouse movements for inferring query intent

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Generating succinct titles for web URLs

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Crowdsourcing for relevance evaluation

ACM SIGIR Forum
Quicklink selection for navigational query results

Proceedings of the 18th international conference on World wide web
Semi-supervised document retrieval

Information Processing and Management: an International Journal
Low-cost and robust evaluation of information retrieval systems

Low-cost and robust evaluation of information retrieval systems
Building enriched document representations using aggregated anchor text

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Learning to Rank for Information Retrieval

Foundations and Trends in Information Retrieval
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
A Boosting Approach for Learning to Rank Using SVD with Partially Labeled Data

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Query reformulation using anchor text

Proceedings of the third ACM international conference on Web search and data mining
Measuring the reusability of test collections

Proceedings of the third ACM international conference on Web search and data mining
Relevance and ranking in online dating systems

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Reusable test collections through experimental design

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Generating pseudo test collections for learning to rank scientific articles

CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
PROMISE retreat report prospects and opportunities for information access evaluation

ACM SIGIR Forum
Retrieval of Web Pages on Real-World Events related to Physical Objects

International Journal of Information Retrieval Research
Pseudo test collections for training and tuning microblog rankers

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Test collections are the primary drivers of progress in information retrieval. They provide yardsticks for assessing the effectiveness of ranking functions in an automatic, rapid, and repeatable fashion and serve as training data for learning to rank models. However, manual construction of test collections tends to be slow, labor-intensive, and expensive. This paper examines the feasibility of constructing web search test collections in a completely unsupervised manner given only a large web corpus as input. Within our proposed framework, anchor text extracted from the web graph is treated as a pseudo query log from which pseudo queries are sampled. For each pseudo query, a set of relevant and non-relevant documents are selected using a variety of web-specific features, including spam and aggregated anchor text weights. The automatically mined queries and judgments form a pseudo test collection that can be used for training ranking functions. Experiments carried out on TREC web track data show that learning to rank models trained using pseudo test collections outperform an unsupervised ranking function and are statistically indistinguishable from a model trained using manual judgments, demonstrating the usefulness of our approach in extracting reasonable quality training data "for free".