Midge: generating image descriptions from computer vision detections

  • Authors:
  • Margaret Mitchell;Xufeng Han;Jesse Dodge;Alyssa Mensch;Amit Goyal;Alex Berg;Kota Yamaguchi;Tamara Berg;Karl Stratos;Hal Daumé, III

  • Affiliations:
  • U. of Aberdeen and Oregon Health and Science University;Stony Brook University;U. of Maryland;MIT;U. of Maryland;Stony Brook University;Stony Brook University;Stony Brook University;Columbia University;U. of Maryland

  • Venue:
  • EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. By leveraging syntactically informed word co-occurrence statistics, the generator filters and constrains the noisy detections output from a vision system to generate syntactic trees that detail what the computer vision system sees. Results show that the generation system outperforms state-of-the-art systems, automatically generating some of the most natural image descriptions to date.