Collective generation of natural image descriptions

  • Authors:
  • Polina Kuznetsova;Vicente Ordonez;Alexander C. Berg;Tamara L. Berg;Yejin Choi

  • Affiliations:
  • Stony Brook University, Stony Brook, NY;Stony Brook University, Stony Brook, NY;Stony Brook University, Stony Brook, NY;Stony Brook University, Stony Brook, NY;Stony Brook University, Stony Brook, NY

  • Venue:
  • ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions available on the web. More specifically, given a query image, we retrieve existing human-composed phrases used to describe visually similar images, then selectively combine those phrases to generate a novel description for the query image. We cast the generation process as constraint optimization problems, collectively incorporating multiple interconnected aspects of language composition for content planning, surface realization and discourse structure. Evaluation by human annotators indicates that our final system generates more semantically correct and linguistically appealing descriptions than two nontrivial baselines.