Natural language descriptions of visual scenes: corpus generation and analysis

  • Authors:
  • Muhammad Usman Ghani Khan;Rao Muhammad Adeel Nawab;Yoshihiko Gotoh

  • Affiliations:
  • University of Sheffield, United Kingdom;University of Sheffield, United Kingdom;University of Sheffield, United Kingdom

  • Venue:
  • EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

As video contents continue to expand, it is increasingly important to properly annotate videos for effective search, mining and retrieval purposes. While the idea of annotating images with keywords is relatively well explored, work is still needed for annotating videos with natural language to improve the quality of video search. The focus of this work is to present a video dataset with natural language descriptions which is a step ahead of keywords based tagging. We describe our initial experiences with a corpus consisting of descriptions for video segments crafted from TREC video data. Analysis of the descriptions created by 13 annotators presents insights into humans' interests and thoughts on videos. Such resource can also be used to evaluate automatic natural language generation systems for video.