Modeling Human Judgment of Digital Imagery for Multimedia Retrieval

Authors:
T. Volkmer;J. A. Thom;S. M.M. Tahaghoghi
Affiliations:
RMIT Univ., Melbourne;-;-
Venue:
IEEE Transactions on Multimedia
Year:
2007

Citing 0
Cited 9

Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
On the Feasibility of a Tag-Based Approach for Deciding Which Objects a Picture Shows: An Empirical Study

SAMT '09 Proceedings of the 4th International Conference on Semantic and Digital Media Technologies: Semantic Multimedia
How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation

Proceedings of the international conference on Multimedia information retrieval
Using manual and automated annotations to search images by semantic similarity

Multimedia Tools and Applications
On the creation of visual models for keywords through crowdsourcing

ACA'12 Proceedings of the 11th international conference on Applications of Electrical and Computer Engineering
Tag-based algorithms can predict human ratings of which objects a picture shows

Multimedia Tools and Applications
Automatic image tagging using two-layered Bayesian networks and mobile data from smart phones

Proceedings of the 10th International Conference on Advances in Mobile Computing & Multimedia
Crowdsourcing for affective-interaction in computer games

Proceedings of the 2nd ACM international workshop on Crowdsourcing for multimedia
A mobile picture tagging system using tree-structured layered Bayesian networks

Mobile Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The application of machine learning techniques to image and video search has been shown to boost the performance of multimedia retrieval systems, and promises to lead to more generalized semantic search approaches. In particular, the availability of large training collections allows model-driven search using a substantial number of semantic concepts. The training collections are obtained in a manual annotation process where human raters review images and assign predefined semantic concept labels. Besides being prone to human error, manual image annotation is biased by the view of the individual annotator because visual information almost always leaves room for ambiguity. Ideally, several independent judgments are obtained per image, and the inter-rater agreement is assessed. While disagreement between ratings bears valuable information on the annotation quality, it complicates the task of clearly classifying rated images based on multiple judgments. In the absence of a gold standard, evaluating multiple judgments and resolving disagreement between raters is not trivial. In this paper, we present an approach using latent structure analysis to solve this problem. We apply latent class modeling to the annotation data collected during the TRECVID 2005 Annotation Forum, and demonstrate how to use this statistic to clearly classify each image on the basis of varying numbers of ratings. We use latent class modeling to quantify the annotation quality and discuss the results in comparison with the well-known Kappa inter-rater agreement measure.