Understanding images with natural sentences

Authors:
Yoshitaka Ushiku;Tatsuya Harada;Yasuo Kuniyoshi
Affiliations:
The University of Tokyo, Tokyo, Japan;The University of Tokyo / JST PRESTO, Tokyo, Japan;The University of Tokyo, Tokyo, Japan
Venue:
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Year:
2011

Citing 5
Cited 2

Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Every picture tells a story: generating sentences from images

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Automatic sentence generation from images

MM '11 Proceedings of the 19th ACM international conference on Multimedia

The ACM Multimedia Grand Challenge 2011 in a nutshell

ACM SIGMultimedia Records
Beyond audio and video retrieval: towards multimedia summarization

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel system which generates sentential captions for general images. For people to use numerous images effectively on the web, technologies must be able to explain image contents and must be capable of searching for data that users need. Moreover, images must be described with natural sentences based not only on the names of objects contained in an image but also on their mutual relations. The proposed system uses general images and captions available on the web as training data to generate captions for new images. Furthermore, because the learning cost is independent from the amount of data, the system has scalability, which makes it useful with large-scale data.