Using hit curves to compare search algorithm performance

Authors:
Jorge R. Herskovic;M. Sriram Iyengar;Elmer V. Bernstam
Affiliations:
School of Health Information Sciences, The University of Texas Health Science Center at Houston, Houston, TX, USA;School of Health Information Sciences, The University of Texas Health Science Center at Houston, Houston, TX, USA;School of Health Information Sciences, The University of Texas Health Science Center at Houston, Houston, TX, USA
Venue:
Journal of Biomedical Informatics
Year:
2007

Citing 7
Cited 0

Evaluation of evaluation in information retrieval

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Visual information retrieval

Communications of the ACM
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Reflections on Mira: interactive evaluation in information retrieval

Journal of the American Society for Information Science
Modern Information Retrieval

Modern Information Retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Databases continue to grow but the metrics available to evaluate information retrieval systems have not changed. Large collections such as MEDLINE and the World Wide Web contain many relevant documents for common queries. Ranking is therefore increasingly important and successful information retrieval systems, such as Google, have emphasized ranking. However, existing evaluation metrics such as precision and recall, do not directly account for ranking. This paper describes a novel way of measuring information retrieval performance using weighted hit curves adapted from the field of statistical detection to reflect multiple desirable characteristics such as relevance, importance, and methodologic quality. In statistical detection, hit curves have been proposed to represent occurrence of interesting events during a detection process. Similarly, hit curves can be used to study the position of relevant documents within large result sets. We describe hit curves in light of a formal model of information retrieval, show how hit curves represent system performance including ranking, and define ways to statistically compare performance of multiple systems using hit curves. We provide example scenarios where traditional measures are less suitable than hit curves and conclude that hit curves may be useful for evaluating retrieval from large collections where ranking performance is crucial.