A multidimensional approach to detect action scene in video data

  • Authors:
  • L. N. Abdullah;S. A. M. Noah;T. M. T. Sembok;K. Omar

  • Affiliations:
  • University Putra Malaysia, UPM Serdang, Malaysia;University Kebangsaan Malaysia, Bangi, Malaysia;University Kebangsaan Malaysia, Bangi, Malaysia;University Kebangsaan Malaysia, Bangi, Malaysia

  • Venue:
  • ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

There is a need to automatically extracting video content for efficient access, understanding, browsing and retrieval of videos. Detecting and interpreting human presence, actions and activities is one of the most valuable functions in this proposed framework. The general objectives of this research are to analyze and process the audio-video streams to a robust audiovisual action recognition system by integrating, structuring and accessing multimodal information via multidimensional retrieval and extraction model. The research also presented a method to characterize, detect, identify, and abstract action by combining low level and high level features. The proposed technique characterizes the action scenes by integrating cues obtained from both the audio and video tracks. Information is combined based on visual features (motion, edge, and visual characteristics of objects), audio features and video for recognizing action. This model uses HMM and GMM to provide a framework for fusing these features and to represent the multidimensional structure of the framework. Compared with using single source of either visual or audio track alone, such combined audio-visual information provides more reliable performance and allows us to understand the story content of movies in more detail. Several experiments were conducted and the results showed that by using visual features only (74%), audio features only (65%) and combined audiovisual (88%). The results showed an improvement in recognition when both audio and visual cues are combined.