Corpus-guided sentence generation of natural images

  • Authors:
  • Yezhou Yang;Ching Lik Teo;Hal Daumé, III;Yiannis Aloimonos

  • Affiliations:
  • University of Maryland Institute for Advanced Computer Studies, College Park, Maryland;University of Maryland Institute for Advanced Computer Studies, College Park, Maryland;University of Maryland Institute for Advanced Computer Studies, College Park, Maryland;University of Maryland Institute for Advanced Computer Studies, College Park, Maryland

  • Venue:
  • EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies that use vision alone.