Generating Natural Language Description of Human Behavior from Video Images

  • Authors:
  • Affiliations:
  • Venue:
  • ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In visual surveillance applications, it is becoming popular to perceive video images and to interpret them using natural language concepts. In this paper, we propose a new approach to generate natural language description of human behavior appeared in real video images. First, a head region of a human, on behalf of the whole body, is extracted from each frame. Using a model-based method, three dimensional pose and position of the head are estimated. Next, the trajectory of these parameters is divided into segments of monotonous motions. For each segment, we evaluate conceptual features such as degree of change of pose and position and that of relative distance to some objects in the surroundings, and so on. By calculating product of these feature values, a most suitable verb is selected and other syntactic elements are supplied. Finally, natural language text is generated using technique of machine translation.