Automatic sentence generation from images

Authors:
Yoshitaka Ushiku;Tatsuya Harada;Yasuo Kuniyoshi
Affiliations:
The University of Tokyo, Tokyo, Japan;The University of Tokyo / JST PRESTO, Tokyo, Japan;The University of Tokyo, Tokyo, Japan
Venue:
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Year:
2011

Citing 7
Cited 3

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Evaluating Color Descriptors for Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Every picture tells a story: generating sentences from images

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV

Understanding images with natural sentences

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Efficient image annotation for automatic sentence generation

Proceedings of the 20th ACM international conference on Multimedia
What is happening: annotating images with verbs

Proceedings of the 20th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

For the overwhelming amounts of multimedia used on the Web, methods of search and understanding with sentences are necessary. Representing the contents not only using labels but also using sentences including labels' relations enables users to search with a story and to understand multimedia deeply. However, few existing works describe such sentences because obtaining objects' relations and grammar is difficult. We specifically examine captions of images that are similar to an input image. They are expected to explain the input image to some degree. Therefore, we propose a novel approach to generate a sentential caption for the input image by summarizing those captions. Our experiment using a dataset consisting of images and text demonstrates that the proposed method can generate sentential captions.