Efficient image annotation for automatic sentence generation

Authors:
Yoshitaka Ushiku;Tatsuya Harada;Yasuo Kuniyoshi
Affiliations:
The University of Tokyo, Tokyo, Japan;The University of Tokyo & JST PRESTO, Tokyo, Japan;The University of Tokyo, Tokyo, Japan
Venue:
Proceedings of the 20th ACM international conference on Multimedia
Year:
2012

Citing 21
Cited 1

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
A New Baseline for Image Annotation

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Scene Discovery by Matrix Factorization

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part IV
How many words is a picture worth? Automatic caption generation for news images

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Generating image descriptions using dependency relational patterns

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Every picture tells a story: generating sentences from images

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Improving the fisher kernel for large-scale image classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Composing simple image descriptions using web-scale n-grams

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Holistic Feature Extraction for Automatic Image Annotation

MUE '11 Proceedings of the 2011 Fifth FTRA International Conference on Multimedia and Ubiquitous Engineering
Automatic sentence generation from images

MM '11 Proceedings of the 19th ACM international conference on Multimedia
A discriminative approach for the retrieval of images from text queries

ECML'06 Proceedings of the 17th European conference on Machine Learning
Corpus-guided sentence generation of natural images

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Recognition using visual phrases

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Baby talk: Understanding and generating simple image descriptions

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
WSABIE: scaling up to large vocabulary image annotation

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

Exploiting language models to recognize unseen actions

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sentence generation from images is an ultimate goal of image recognition. In this paper, we attack a novel problem, the "multi-keyphrase problem", to address this goal. We hypothesize that image contents can be described with multi-keyphrases, and that a natural sentence can be generated by connecting multi-keyphrases with an experimental grammar model. Existing methods require semantic knowledge such as labels of an object, action, or scene. Using these methods, we must strive to prepare a highly organized dataset. Therefore, we propose a novel online learning method for multi-keyphrase estimation. The proposed framework, although simple and scalable, can generate sentences from images with no semantic knowledge. Moreover, the proposed method for multi-keyphrase estimation is applicable to image annotation, and it achieves state-of-the-art performance. Our experiment using only images and texts demonstrates that the proposed framework is useful for sentence generation from images.