How evaluator domain expertise affects search result relevance judgments

  • Authors:
  • Kenneth A. Kinney;Scott B. Huffman;Juting Zhai

  • Affiliations:
  • Google, Inc., Mountain View, CA, USA;Google, Inc., Mountain View, CA, USA;Google, Inc., Mountain View, CA, USA

  • Venue:
  • Proceedings of the 17th ACM conference on Information and knowledge management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional search evaluation approaches have often relied on domain experts to evaluate results for each query. Unfortunately, the range of topics present in any representative sample of web queries makes it impractical to have expert evaluators for every topic. In this paper, we investigate the effect of using "generalist" evaluators instead of experts in the domain of queries being evaluated. Empirically, we ind that for queries drawn from domains requiring high expertise, (1) generalists tend to give shallow, inaccurate ratings as compared to experts. (2) Further experiments show that generalists disagree on the underlying meaning of these queries significantly more often than experts, and often appear to "give up'' and fall back on surface features such as keyword matching. (3) Finally, by estimating the percentage of "expertise requiring'' queries in a web query sample, we estimate the impact of using generalists, versus the ideal of having domain experts for every "expertise requiring'' query.