Deep versus shallow judgments in learning to rank

Authors:
Emine Yilmaz;Stephen Robertson
Affiliations:
Microsoft Research , Cambridge, United Kingdom;Microsoft Research, Cambridge, United Kingdom
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 3
Cited 5

IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Evaluation over thousands of queries

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Active learning for ranking through expected loss optimization

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Extending average precision to graded relevance judgments

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Relevant knowledge helps in choosing right teacher: active query selection for ranking adaptation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Active query selection for learning rankers

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Variance maximization via noise injection for active sampling in learning to rank

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Much research in learning to rank has been placed on developing sophisticated learning methods, treating the training set as a given. However, the number of judgments in the training set directly aff ects the quality of the learned system. Given the expense of obtaining relevance judgments for constructing training data, one often has a limited budget in terms of how many judgments he can get. The major problem then is how to distribute this judgment e ffort across diff erent queries. In this paper, we investigate the tradeo ff between the number of queries and the number of judgments per query when training sets are constructed. In particular, we show that up to a limit, training sets with more queries but shallow (less) judgments per query are more cost effective than training sets with less queries but deep (more) judgments per query.