The Recognition of Human Movement Using Temporal Templates
IEEE Transactions on Pattern Analysis and Machine Intelligence
Mean Shift: A Robust Approach Toward Feature Space Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Behavior recognition via sparse spatio-temporal features
ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
Robust Object Detection with Interleaved Categorization and Segmentation
International Journal of Computer Vision
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
International Journal of Computer Vision
Attention-driven action retrieval with DTW-based 3d descriptor matching
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Action video retrieval based on atomic action vocabulary
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Recovering human body configurations: combining segmentation and recognition
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Machine Recognition of Human Activities: A Survey
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
Action recognition has attracted much attention for human behavior analysis in recent years. Local spatial-temporal (ST) features are widely adopted in many works. However, most existing works which represent action video by histogram of ST words fail to have a deep insight into a fine structure of actions because of the local nature of these features. In this paper, we propose a novel method to simultaneously localize and recognize action units (AU) by regarding them as 3D (x,y,t) objects. Firstly, we record all of the local ST features in a codebook with the information of action class labels and relative positions to the respective AU centers. This simulates the probability distribution of class label and relative position in a non-parameter manner. When a novel video comes, we match its ST features to the codebook entries and cast votes for positions of its AU centers. And we utilize the localization result to recognize these AUs. The presented experiments on a public dataset demonstrate that our method performs well.