Performance measures for multilabel evaluation: a case study in the area of image classification

  • Authors:
  • Stefanie Nowak;Hanna Lukashevich;Peter Dunker;Stefan Rüger

  • Affiliations:
  • Fraunhofer IDMT, Ilmenau, Germany;Fraunhofer IDMT, Ilmenau, Germany;Gracenote, Inc, Emeryville, CA, USA;Open University, Milton Keynes, England UK

  • Venue:
  • Proceedings of the international conference on Multimedia information retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the steadily increasing amount of multimedia documents on the web and at home, the need for reliable semantic indexing methods that assign multiple keywords to a document grows. The performance of existing approaches is often measured with standard evaluation measures of the information retrieval community. In a case study on image annotation, we show the behaviour of 13 different evaluation measures and point out their strengths and weaknesses. For the analysis, data from 19 research groups that participated in the ImageCLEF Photo Annotation Task are utilized together with several configurations based on random numbers. A recently proposed ontology-based measure was investigated that incorporates structure information, relationships from the ontology and the agreement between annotators for a concept and compared to a hierarchical variant. The results for the hierarchical measure are not competitive. The ontology-based results assign good scores to the systems that got also good ranks in the other measures like the example-based F-measure. For concept-based evaluation, stable results could be obtained for MAP concerning random numbers and the number of annotated labels. The AUC measure shows good evaluation characteristics in case all annotations contain confidence values.