Select-the-Best-Ones: A new way to judge relative relevance

Authors:
Ruihua Song;Qingwei Guo;Ruochi Zhang;Guomao Xin;Ji-Rong Wen;Yong Yu;Hsiao-Wuen Hon
Affiliations:
Shanghai Jiao Tong University, Shanghai 200240, China and Microsoft Research Asia, Beijing 100190, China;Microsoft Research Asia, Beijing 100190, China;Microsoft Research Asia, Beijing 100190, China;Microsoft Research Asia, Beijing 100190, China;Microsoft Research Asia, Beijing 100190, China;Shanghai Jiao Tong University, Shanghai 200240, China;Microsoft Research Asia, Beijing 100190, China
Venue:
Information Processing and Management: an International Journal
Year:
2011

Citing 16
Cited 4

Adaptive linear information retrieval models

SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Measuring relevance judgements

Information Processing and Management: an International Journal
A cognitive view of the situational dynamism of user-centered relevance estimation

Journal of the American Society for Information Science - Special issue: relevance research
Variations in relevance assessments and the measurement of retrieval effectiveness

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Towards the identification of the optimal number of relevance categories

Journal of the American Society for Information Science
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The concept of relevance in IR

Journal of the American Society for Information Science and Technology
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Learning to rank: from pairwise approach to listwise approach

Proceedings of the 24th international conference on Machine learning
A regression framework for learning ranking functions using relative relevance judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Here or there: preference judgments for relevance

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Measuring the agreement among relevance judges

MIRA'99 Proceedings of the 1999 international conference on Final Mira

Top-k learning to rank: labeling, ranking and evaluation

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Mining contextual preference rules for building user profiles

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Better than their reputation? on the reliability of relevance assessments with students

CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Is top-k sufficient for ranking?

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the traditional evaluation of information retrieval systems, assessors are asked to determine the relevance of a document on a graded scale, independent of any other documents. Such judgments are absolute judgments. Learning to rank brings some new challenges to this traditional evaluation methodology, especially regarding absolute relevance judgments. Recently preferences judgments have been investigated as an alternative. Instead of assigning a relevance grade to a document, an assessor looks at a pair of pages and judges which one is better. In this paper, we generalize pairwise preference judgments to relative judgments. We formulate the problem of relative judgments in a formal way and then propose a new strategy called Select-the-Best-Ones to solve the problem. Through user studies, we compare our proposed method with a pairwise preference judgment method and an absolute judgment method. The results indicate that users can distinguish by about one more relevance degree when using relative methods than when using the absolute method. Consequently, the relative methods generate 15-30% more document pairs for learning to rank. Compared to the pairwise method, our proposed method increases the agreement among assessors from 95% to 99%, while halving the labeling time and the number of discordant pairs to experts' judgments.