Do user preferences and evaluation measures line up?

  • Authors:
  • Mark Sanderson;Monica Lestari Paramita;Paul Clough;Evangelos Kanoulas

  • Affiliations:
  • University of Sheffield, Sheffield, United Kingdom;University of Sheffield, Sheffield, United Kingdom;University of Sheffield, Sheffield, United Kingdom;University of Sheffield, Sheffield, United Kingdom

  • Venue:
  • Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents results comparing user preference for search engine rankings with measures of effectiveness computed from a test collection. It establishes that preferences and evaluation measures correlate: systems measured as better on a test collection are preferred by users. This correlation is established for both "conventional web retrieval" and for retrieval that emphasizes diverse results. The nDCG measure is found to correlate best with user preferences compared to a selection of other well known measures. Unlike previous studies in this area, this examination involved a large population of users, gathered through crowd sourcing, exposed to a wide range of retrieval systems, test collections and search tasks. Reasons for user preferences were also gathered and analyzed. The work revealed a number of new results, but also showed that there is much scope for future work refining effectiveness measures to better capture user preferences.