A study on performance volatility in information retrieval

  • Authors:
  • Mehdi Hosseini

  • Affiliations:
  • University College London, London, United Kingdom

  • Venue:
  • Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A common practice in comparative evaluation of information retrieval (IR) systems is to create a test collection comprising a set of topics (queries), a document corpus, and relevance judgments, and to monitor the performance of retrieval systems over such a collection. A typical evaluation of a system involves computing a performance metric, e.g., Average Precision (AP), for each topic and then using the average performance metric, e.g., Mean Average Precision (MAP) to express the overall system performance. However, averages do not capture all the important aspects of system performance, and used alone may not thoroughly express system effectiveness, i.e., average of performance can mask large variance in individual topic effectiveness. The author hypothesis is that, in addition to the average of overall performance, attention needs to be paid to how a system performance varies across topics. This variability can be measured by calculating the standard deviation (SD) of individual performance scores. We refer to this performance variation as Volatility.