Relevance dimensions in preference-based IR evaluation

Authors:
Jinyoung Kim;Gabriella Kazai;Imed Zitouni
Affiliations:
Microsoft, Bellevue, WA, USA;Microsoft Research, Cambridge, United Kingdom;Microsoft, Bellevue, WA, USA
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 11
Cited 1

Dimensions of relevance

Information Processing and Management: an International Journal
The concept of relevance in IR

Journal of the American Society for Information Science and Technology
Relevance judgment: What do information users consider beyond topicality?

Journal of the American Society for Information Science and Technology - Research Articles
Evaluation by comparing result sets in context

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
The relationship between IR effectiveness measures and user satisfaction

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance

Journal of the American Society for Information Science and Technology
Here or there: preference judgments for relevance

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Do user preferences and evaluation measures line up?

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Comparing the sensitivity of information retrieval metrics

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Evaluating search systems using result page context

Proceedings of the third symposium on Information interaction in context
Using preference judgments for novel document retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Contextual and dimensional relevance judgments for reusable SERP-level evaluation

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Evaluation of information retrieval (IR) systems has recently been exploring the use of preference judgments over two search result lists. Unlike the traditional method of collecting relevance labels per single result, this method allows to consider the interaction between search results as part of the judging criteria. For example, one result list may be preferred over another if it has a more diverse set of relevant results, covering a wider range of user intents. In this paper, we investigate how assessors determine their preference for one list of results over another with the aim to understand the role of various relevance dimensions in preference-based evaluation. We run a series of experiments and collect preference judgments over different relevance dimensions in side-by-side comparisons of two search result lists, as well as relevance judgments for the individual documents. Our analysis of the collected judgments reveals that preference judgments combine multiple dimensions of relevance that go beyond the traditional notion of relevance centered on topicality. Measuring performance based on single document judgments and NDCG aligns well with topicality based preferences, but shows misalignment with judges' overall preferences, largely due to the diversity dimension. As a judging method, dimensional preference judging is found to lead to improved judgment quality.