Evaluation by highly relevant documents

  • Authors:
  • Ellen M. Voorhees

  • Affiliations:
  • National Institute of Standards and Technology, Gaithersburg, MD

  • Venue:
  • Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given the size of the web, the search engine industry has argued that engines should be evaluated by their ability to retrieve highly relevant pages rather than all possible relevant pages. To explore the role highly relevant documents play in retrieval system evaluation, assessors for the \mbox{TREC-9} web track used a three-point relevance scale and also selected best pages for each topic. The relative effectiveness of runs evaluated by different relevant document sets differed, confirming the hypothesis that different retrieval techniques work better for retrieving highly relevant documents. Yet evaluating by highly relevant documents can be unstable since there are relatively few highly relevant documents. TREC assessors frequently disagreed in their selection of the best page, and subsequent evaluation by best page across different assessors varied widely. The discounted cumulative gain measure introduced by J\"{a}rvelin and Kek\"{a}l\"{a}inen increases evaluation stability by incorporating all relevance judgments while still giving precedence to highly relevant documents.