A critical investigation of recall and precision as measures of retrieval system performance

  • Authors:
  • Vijay Raghavan;Peter Bollmann;Gwang S. Jung

  • Affiliations:
  • Univ. of Southwestern Louisiana, Lafayette;Technische Univ. Berlin, Berlin, W. Germany;Technische Univ. Berlin, Berlin, W. Germany

  • Venue:
  • ACM Transactions on Information Systems (TOIS)
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recall and precision are often used to evaluate the effectiveness of information retrieval systems. They are easy to define if there is a single query and if the retrieval result generated for the query is a linear ordering. However, when the retrieval results are weakly ordered, in the sense that several documents have an identical retrieval status value with respect to a query, some probabilistic notion of precision has to be introduced. Relevance probability, expected precision, and so forth, are some alternatives mentioned in the literature for this purpose. Furthermore, when many queries are to be evaluated and the retrieval results averaged over these queries, some method of interpolation of precision values at certain preselected recall levels is needed. The currently popular approaches for handling both a weak ordering and interpolation are found to be inconsistent, and the results obtained are not easy to interpret. Moreover, in cases where some alternatives are available, no comparative analysis that would facilitate the selection of a particular strategy has been provided. In this paper, we systematically investigate the various problems and issues associated with the use of recall and precision as measures of retrieval system performance. Our motivation is to provide a comparative analysis of methods available for defining precision in a probabilistic sense and to promote a better understanding of the various issues involved in retrieval performance evaluation.