Describing video contents in natural language

Authors:
Muhammad Usman Ghani Khan;Yoshihiko Gotoh
Affiliations:
University of Sheffield, United Kingdom;University of Sheffield, United Kingdom
Venue:
HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
Year:
2012

Citing 16
Cited 0

From image sequences to natural language: a first step toward automatic perception and description of motions

Applied Artificial Intelligence
Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions

International Journal of Computer Vision
Context-based vision system for place and object recognition

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Video query: research directions

IBM Journal of Research and Development - Papers on mustimedia systems
Steps toward a cognitive vision system

AI Magazine
Manual and automatic evaluation of summaries

AS '02 Proceedings of the ACL-02 Workshop on Automatic Summarization - Volume 4
Human action recognition using star skeleton

Proceedings of the 4th ACM international workshop on Video surveillance and sensor networks
Automatic Learning of Conceptual Knowledge in Image Sequences for Human Behavior Interpretation

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
Face detection and recognition of natural human emotion using Markov random fields

Personal and Ubiquitous Computing
Semantic Representation and Recognition of Continued and Recursive Human Activities

International Journal of Computer Vision
SimpleNLG: a realisation engine for practical applications

ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
The Pascal Visual Object Classes (VOC) Challenge

International Journal of Computer Vision
Context based object categorization: A critical survey

Computer Vision and Image Understanding
A Novel Method for Efficient Indoor---Outdoor Image Classification

Journal of Signal Processing Systems
Emotion recognition from arbitrary view facial images

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Corpus-guided sentence generation of natural images

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This contribution addresses generation of natural language descriptions for human actions, behaviour and their relations with other objects observed in video streams. The work starts with implementation of conventional image processing techniques to extract high level features from video. These features are converted into natural language descriptions using context free grammar. Although feature extraction processes are erroneous at various levels, we explore approaches to putting them together to produce a coherent description. Evaluation is made by calculating ROUGE scores between human annotated and machine generated descriptions. Further we introduce a task based evaluation by human subjects which provides qualitative evaluation of generated descriptions.