The Recognition of Human Movement Using Temporal Templates
IEEE Transactions on Pattern Analysis and Machine Intelligence
Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns
IEEE Transactions on Pattern Analysis and Machine Intelligence
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Object Recognition with Features Inspired by Visual Cortex
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Rapid Biologically-Inspired Scene Classification Using Features Shared with Visual Attention
IEEE Transactions on Pattern Analysis and Machine Intelligence
Behavior recognition via sparse spatio-temporal features
ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
A 3-dimensional sift descriptor and its application to action recognition
Proceedings of the 15th international conference on Multimedia
Object Class Recognition and Localization Using Sparse Features with Limited Receptive Fields
International Journal of Computer Vision
Kernel Codebooks for Scene Categorization
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Feature detector and descriptor evaluation in human action recognition
Proceedings of the ACM International Conference on Image and Video Retrieval
Biologically inspired feature manifold for scene classification
IEEE Transactions on Image Processing
Convolutional learning of spatio-temporal features
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
A survey of vision-based methods for action representation, segmentation and recognition
Computer Vision and Image Understanding
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Spatiotemporal salient points for visual recognition of human actions
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
HMDB: A large video database for human motion recognition
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
In defense of soft-assignment coding
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
A tensor motion descriptor based on histograms of gradients and optical flow
Pattern Recognition Letters
Hi-index | 0.10 |
We present a new descriptor for local representation of human actions. In contrast to state-of-the-art descriptors, which use spatio-temporal features to describe cuboids detected from video sequences, we propose to employ a 2D descriptor based on the Laplacian pyramid for efficiently encoding spatio-temporal regions of interest. Image templates including structural planes and motion templates, are firstly extracted from a cuboid to encode the structural and motion features. A 2D Laplacian pyramid is then performed to decompose each of those images into a series of sub-band feature maps, which is followed by a two-stage feature extraction, i.e., Gabor filtering and max pooling. Motion-related edge and orientation information is enhanced after the filtering. To capture more discriminative and invariant features, max pooling is applied to the outputs of Gabor filtering, between scales within filter banks and over spatial neighbors. The obtained local features associated with cuboids are fed to the localized soft-assignment coding with max pooling on the Bag-of-Words (BoWs) model to represent an action. The image templates, i.e., MHI and TOP, explicitly encode the motion and structure information in the video sequences and the proposed Laplacian pyramid coding descriptor provides an informative representation of them due to the multi-scale analysis. The employment of localized soft-assignment coding and max pooling gives a robust representation of actions. Experimental results on the benchmark KTH dataset and the newly released and challenging HMDB51 dataset demonstrate the effectiveness of the proposed method for human action recognition.