Variation of relevance assessments for medical image retrieval

Authors:
Henning Müller;Paul Clough;Bill Hersh;Antoine Geissbühler
Affiliations:
University and Hospitals of Geneva, Medical Informatics, Geneva, Switzerland;Department of Information Studies, Sheffield University, Sheffield, UK;Biomedical Informatics, Oregon Health and Science University, Portland, OR;University and Hospitals of Geneva, Medical Informatics, Geneva, Switzerland
Venue:
AMR'06 Proceedings of the 4th international conference on Adaptive multimedia retrieval: user, context, and feedback
Year:
2006

Citing 10
Cited 1

A re-examination of relevance: toward a dynamic, situational definition

Information Processing and Management: an International Journal
How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Performance evaluation in content-based image retrieval: overview and proposals

Pattern Recognition Letters - Special issue on image/video indexing and retrieval
Benchmarking for Content-Based Visual Information Search

VISUAL '00 Proceedings of the 4th International Conference on Advances in Visual Information Systems
Report on CLEF-2001 Experiments: Effective Combined Query-Translation Approach

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Image Retrieval Evaluation

CBAIVL '98 Proceedings of the IEEE Workshop on Content - Based Access of Image and Video Libraries
TRECVID: evaluating the effectiveness of information retrieval tasks on digital video

Proceedings of the 12th annual ACM international conference on Multimedia
The CLEF 2005 cross–language image retrieval track

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
The CLEF 2004 cross-language image retrieval track

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images

3D case–based retrieval for interstitial lung diseases

MCBR-CDS'09 Proceedings of the First MICCAI international conference on Medical Content-Based Retrieval for Clinical Decision Support

Quantified Score

Hi-index	0.00

Visualization

Abstract

Evaluation is crucial for the success of most research domains, and image retrieval is no exception to this. Recently, several benchmarks have been developed for visual information retrieval such as TRECVID, ImageCLEF, and ImagEval to create frameworks for evaluating image retrieval research. An important part of evaluation is the creation of a ground truth or gold standard to evaluate systems against. Much experience has been gained on creating ground truths for textual information retrieval, but for image retrieval these issues require further research. This article will present the process of generating relevance judgements for the medical image retrieval task of ImageCLEF. Many of the problems encountered can be generalised to other image retrieval tasks as well, so the outcome is not limited to the medical domain. Part of the images analysed for relevance were judged by two assessors, and these are analysed with respect to their consistency and potential problems. Our goal is to obtain more information on the ambiguity of the topics developed and generally to keep the variation amongst relevance assessors low. This might partially reduce the subjectivity of system-oriented evaluation, although the evaluation shows that the differences in relevance judgements only have a limited influence on comparative system ranking. A number of outcomes are presented with a goal in mind to create less ambiguous topics for future evaluation campaigns.