Describing video contents in natural language

  • Authors:
  • Muhammad Usman Ghani Khan;Yoshihiko Gotoh

  • Affiliations:
  • University of Sheffield, United Kingdom;University of Sheffield, United Kingdom

  • Venue:
  • HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This contribution addresses generation of natural language descriptions for human actions, behaviour and their relations with other objects observed in video streams. The work starts with implementation of conventional image processing techniques to extract high level features from video. These features are converted into natural language descriptions using context free grammar. Although feature extraction processes are erroneous at various levels, we explore approaches to putting them together to produce a coherent description. Evaluation is made by calculating ROUGE scores between human annotated and machine generated descriptions. Further we introduce a task based evaluation by human subjects which provides qualitative evaluation of generated descriptions.