Multimodal Video Indexing: A Review of the State-of-the-art
Multimedia Tools and Applications
Actions Sketch: A Novel Action Representation
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Action Recognition in Broadcast Tennis Video
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
Automatic Sports Video Genre Classification using Pseudo-2D-HMM
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
Audio Content-based Highlight Detection Using Adaptive Hidden Markov Model
ISDA '06 Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications - Volume 01
A general method for human activity recognition in video
Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
Free viewpoint action recognition using motion history volumes
Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
Hi-index | 0.00 |
There is a need to automatically extracting video content for efficient access, understanding, browsing and retrieval of videos. Detecting and interpreting human presence, actions and activities is one of the most valuable functions in this proposed framework. The general objectives of this research are to analyze and process the audio-video streams to a robust audiovisual action recognition system by integrating, structuring and accessing multimodal information via multidimensional retrieval and extraction model. The research also presented a method to characterize, detect, identify, and abstract action by combining low level and high level features. The proposed technique characterizes the action scenes by integrating cues obtained from both the audio and video tracks. Information is combined based on visual features (motion, edge, and visual characteristics of objects), audio features and video for recognizing action. This model uses HMM and GMM to provide a framework for fusing these features and to represent the multidimensional structure of the framework. Compared with using single source of either visual or audio track alone, such combined audio-visual information provides more reliable performance and allows us to understand the story content of movies in more detail. Several experiments were conducted and the results showed that by using visual features only (74%), audio features only (65%) and combined audiovisual (88%). The results showed an improvement in recognition when both audio and visual cues are combined.