Action recognition via bio-inspired features: The richness of center-surround interaction
Computer Vision and Image Understanding
Sparse Modeling of Human Actions from Motion Imagery
International Journal of Computer Vision
State of the Art Report on Video-Based Graphics and Video Visualization
Computer Graphics Forum
Deep nonlinear metric learning with independent subspace analysis for face verification
Proceedings of the 20th ACM international conference on Multimedia
Multi-channel shape-flow kernel descriptors for robust video event detection and retrieval
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Complex events detection using data-driven concepts
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Directional space-time oriented gradients for 3d visual pattern analysis
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
A convolutional treelets binary feature approach to fast keypoint recognition
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Trajectory-Based modeling of human actions with motion reference points
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Motion interchange patterns for action recognition in unconstrained videos
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Space-variant descriptor sampling for action recognition based on saliency and eye movements
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VII
Atomic action features: a new feature for action recognition
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Learning invariant feature hierarchies
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Recognizing complex events using large margin joint low-level event model
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Towards space-time semantics in two frames
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Action recognition using linear dynamic systems
Pattern Recognition
AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Action segmentation in dance videos
PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
Auto learning temporal atomic actions for activity classification
Pattern Recognition
A line based pose representation for human action recognition
Image Communication
Action recognition using canonical correlation kernels
ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part III
Folk dance recognition using a bag of words approach and ISA/STIP features
Proceedings of the 6th Balkan Conference in Informatics
Combining multiple sensors for event recognition of older people
Proceedings of the 1st ACM international workshop on Multimedia indexing and information retrieval for healthcare
Action recognition using invariant features under unexampled viewing conditions
Proceedings of the 21st ACM international conference on Multimedia
A feature construction method for general object recognition
Pattern Recognition
Combining modality specific deep neural networks for emotion recognition in video
Proceedings of the 15th ACM on International conference on multimodal interaction
A local descriptor based on Laplacian pyramid coding for action recognition
Pattern Recognition Letters
Multiple scale-specific representations for improved human action recognition
Pattern Recognition Letters
Deep feature learning using target priors with applications in ECoG signal decoding for BCI
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Matching mixtures of curves for human action recognition
Computer Vision and Image Understanding
Graph-based approach for human action recognition using spatio-temporal features
Journal of Visual Communication and Image Representation
Multimedia event detection with multimodal feature fusion and temporal concept localization
Machine Vision and Applications
Hi-index | 0.00 |
Previous work on action recognition has focused on adapting hand-designed local features, such as SIFT or HOG, from static images to the video domain. In this paper, we propose using unsupervised feature learning as a way to learn features directly from video data. More specifically, we present an extension of the Independent Subspace Analysis algorithm to learn invariant spatio-temporal features from unlabeled video data. We discovered that, despite its simplicity, this method performs surprisingly well when combined with deep learning techniques such as stacking and convolution to learn hierarchical representations. By replacing hand-designed features with our learned features, we achieve classification results superior to all previous published results on the Hollywood2, UCF, KTH and YouTube action recognition datasets. On the challenging Hollywood2 and YouTube action datasets we obtain 53.3% and 75.8% respectively, which are approximately 5% better than the current best published results. Further benefits of this method, such as the ease of training and the efficiency of training and prediction, will also be discussed. You can download our code and learned spatio-temporal features here: http://ai.stanford.edu/~wzou/.