Midge: generating image descriptions from computer vision detections

Authors:
Margaret Mitchell;Xufeng Han;Jesse Dodge;Alyssa Mensch;Amit Goyal;Alex Berg;Kota Yamaguchi;Tamara Berg;Karl Stratos;Hal Daumé, III
Affiliations:
U. of Aberdeen and Oregon Health and Science University;Stony Brook University;U. of Maryland;MIT;U. of Maryland;Stony Brook University;Stony Brook University;Stony Brook University;Columbia University;U. of Maryland
Venue:
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2012

Citing 15
Cited 6

WordNet: a lexical database for English

Communications of the ACM
Building natural language generation systems

Building natural language generation systems
Building applied natural language generation systems

Natural Language Engineering
Generation that exploits corpus-based statistical knowledge

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Generation of relative referring expressions based on perceptual grouping

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
An investigation into the validity of some metrics for automatically evaluating natural language generation systems

Computational Linguistics
Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Collecting image annotations using Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Every picture tells a story: generating sentences from images

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Semi-supervised modeling for prenominal modifier ordering

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Composing simple image descriptions using web-scale n-grams

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Evaluating sentence compression: pitfalls and suggested remedies

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Corpus-guided sentence generation of natural images

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Baby talk: Understanding and generating simple image descriptions

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

Detecting visual text

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Discourse-based modeling for AAC

SLPAT '12 Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies
From image annotation to image description

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V
A multimodal framework for unsupervised feature fusion

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
3D Wikipedia: using online text to automatically label and navigate reconstructed geometry

ACM Transactions on Graphics (TOG)
Framing image description as a ranking task: data, models and evaluation metrics

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. By leveraging syntactically informed word co-occurrence statistics, the generator filters and constrains the noisy detections output from a vision system to generate syntactic trees that detail what the computer vision system sees. Results show that the generation system outperforms state-of-the-art systems, automatically generating some of the most natural image descriptions to date.