Evaluating information retrieval system performance based on multi-grade relevance

Authors:
Bing Zhou;Yiyu Yao
Affiliations:
Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada;Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada
Venue:
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Year:
2008

Citing 6
Cited 1

Measuring relevance judgements

Information Processing and Management: an International Journal
Linear structure in information retrieval

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Measuring retrieval effectiveness based on user preference of documents

Journal of the American Society for Information Science
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Average gain ratio: a simple retrieval performance measure for evaluation with multiple relevance levels

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Building a framework for the probability ranking principle by a family of expected weighted rank

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the challenges of modern information retrieval is to rank the most relevant documents at the top of the large system output. This brings a call for choosing the proper methods to evaluate the system performance. The traditional performance measures, such as precision and recall, are not able to distinguish different levels of relevance because they are only based on binary relevance. The main objective of this paper is to review 10 existing evaluation methods based on multi-grade relevance and compare their similarities and differences through theoretical and numerical examinations. We find that the normalized distance performance measure is the best choice in terms of the sensitivity to document rank order and giving higher credits to systems for their ability of retrieving highly relevant documents. The cumulated gain-based methods rely on the total relevance score and are not sensitive enough to document rank order.