Action Recognition Using Spatial-Temporal Context

Authors:
Qiong Hu;Lei Qin;Qingming Huang;Shuqiang Jiang;Qi Tian
Affiliations:
-;-;-;-;-
Venue:
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Year:
2010

Citing 0
Cited 4

Bayesian filter based behavior recognition in workflows allowing for user feedback

Computer Vision and Image Understanding
Interactive event detection in crowd scenes

Proceedings of the 4th International Conference on Internet Multimedia Computing and Service
Behavior recognition from video based on human constrained descriptor and adaptable neural networks

Proceedings of the 4th ACM/IEEE international workshop on Analysis and retrieval of tracked events and motion in imagery stream
A top-down event-driven approach for concurrent activity recognition

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The spatial-temporal local features and the bag of words representation have been widely used in the action recognition field. However, this framework usually neglects the internal spatial-temporal relations between video-words, resulting in ambiguity in action recognition task, especially for videos “in the wild”. In this paper, we solve this problem by utilizing the volumetric context around a video-word. Here, a local histogram of video-words distribution is calculated, which is referred as the “context” and further clustered into contextual words. To effectively use the contextual information, the descriptive video-phrases (ST-DVPs) and the descriptive video-cliques (ST-DVCs) are proposed. A general framework for ST-DVP and ST-DVC generation is described, and then action recognition can be done based on all these representations and their combinations. The proposed method is evaluated on two challenging human action datasets: the KTH dataset and the YouTube dataset. Experiment results confirm the validity of our approach.