Generating Natural Language Description of Human Behavior from Video Images

Authors:
Affiliations:
Venue:
ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
Year:
2000

Citing 0
Cited 3

Real-Time Recognition of Human Gestures for 3D Interaction

AMDO '08 Proceedings of the 5th international conference on Articulated Motion and Deformable Objects
Toward natural interaction through visual recognition of body gestures in real-time

Interacting with Computers
Corpus-guided sentence generation of natural images

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In visual surveillance applications, it is becoming popular to perceive video images and to interpret them using natural language concepts. In this paper, we propose a new approach to generate natural language description of human behavior appeared in real video images. First, a head region of a human, on behalf of the whole body, is extracted from each frame. Using a model-based method, three dimensional pose and position of the head are estimated. Next, the trajectory of these parameters is divided into segments of monotonous motions. For each segment, we evaluate conceptual features such as degree of change of pose and position and that of relative distance to some objects in the surroundings, and so on. By calculating product of these feature values, a most suitable verb is selected and other syntactic elements are supplied. Finally, natural language text is generated using technique of machine translation.