What is happening: annotating images with verbs

Authors:
Gang Tian;Genliang Guan;Zhiyong Wang;Dagan Feng
Affiliations:
School of Information Technologies, The University of Sydney, Sydney, Australia;School of Information Technologies, The University of Sydney, Sydney, Australia;School of Information Technologies, The University of Sydney, Sydney, Australia;School of Information Technologies, The University of Sydney, Sydney, Australia
Venue:
Proceedings of the 20th ACM international conference on Multimedia
Year:
2012

Citing 6
Cited 0

An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
To search or to label?: predicting the performance of search-based automatic image classifiers

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Annotating Images by Mining Image Search Results

IEEE Transactions on Pattern Analysis and Machine Intelligence
Human activity analysis: A review

ACM Computing Surveys (CSUR)
A review on automatic image annotation techniques

Pattern Recognition
Automatic sentence generation from images

MM '11 Proceedings of the 19th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Image annotation has been widely investigated to discover the semantics of an image. However, most of the existing algorithms focus on noun tags (e.g. concepts and objects). Since an image is a snapshot of the real world event, annotating images with verbs will enable richer understanding of an image. In this paper, we propose a data-driven approach to verb oriented image annotation. At first, we obtain verb candidates by generating search queries for a given image with initial noun tags and establishing a sentence corpus from those queries. We utilize visualness to filter tags which are not visually presentable (e.g. pain) and differentiate tags into two categories (i.e. scene based and object based) to impose linguistic rules in verb extraction. Then we further re-rank the candidate verbs with the tag context discovered from the images which are both semantically and visually similar to the given image in the MIRFlickr dataset. Our experimental results from user study demonstrate that our proposed approach is promising.