Bayesian filter based behavior recognition in workflows allowing for user feedback
Computer Vision and Image Understanding
Interactive event detection in crowd scenes
Proceedings of the 4th International Conference on Internet Multimedia Computing and Service
Behavior recognition from video based on human constrained descriptor and adaptable neural networks
Proceedings of the 4th ACM/IEEE international workshop on Analysis and retrieval of tracked events and motion in imagery stream
A top-down event-driven approach for concurrent activity recognition
Multimedia Tools and Applications
Hi-index | 0.00 |
The spatial-temporal local features and the bag of words representation have been widely used in the action recognition field. However, this framework usually neglects the internal spatial-temporal relations between video-words, resulting in ambiguity in action recognition task, especially for videos “in the wild”. In this paper, we solve this problem by utilizing the volumetric context around a video-word. Here, a local histogram of video-words distribution is calculated, which is referred as the “context” and further clustered into contextual words. To effectively use the contextual information, the descriptive video-phrases (ST-DVPs) and the descriptive video-cliques (ST-DVCs) are proposed. A general framework for ST-DVP and ST-DVC generation is described, and then action recognition can be done based on all these representations and their combinations. The proposed method is evaluated on two challenging human action datasets: the KTH dataset and the YouTube dataset. Experiment results confirm the validity of our approach.