Sparse coding on local spatial-temporal volumes for human action recognition

Authors:
Yan Zhu;Xu Zhao;Yun Fu;Yuncai Liu
Affiliations:
Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China;Department of CSE, SUNY at Buffalo, NY;Shanghai Jiao Tong University, Shanghai, China
Venue:
ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part II
Year:
2010

Citing 11
Cited 2

Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
On Space-Time Interest Points

International Journal of Computer Vision
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
Behavior recognition via sparse spatio-temporal features

ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
Robust Object Recognition with Cortex-Like Mechanisms

IEEE Transactions on Pattern Analysis and Machine Intelligence
Self-taught learning: transfer learning from unlabeled data

Proceedings of the 24th international conference on Machine learning
A 3-dimensional sift descriptor and its application to action recognition

Proceedings of the 15th international conference on Multimedia
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision
Online dictionary learning for sparse coding

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Building topographic subspace model with transfer learning for sparse representation

Neurocomputing
A study on sampling strategies in space-time domain for recognition applications

MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling

Fast human action classification and VOI localization with enhanced sparse coding

Journal of Visual Communication and Image Representation
Manifold-constrained coding and sparse representation for human action recognition

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

By extracting local spatial-temporal features from videos, many recently proposed approaches for action recognition achieve promising performance. The Bag-of-Words (BoW) model is commonly used in the approaches to obtain the video level representations. However, BoW model roughly assigns each feature vector to its closest visual word, therefore inevitably causing nontrivial quantization errors and impairing further improvements on classification rates. To obtain a more accurate and discriminative representation, in this paper, we propose an approach for action recognition by encoding local 3D spatial-temporal gradient features within the sparse coding framework. In so doing, each local spatial-temporal feature is transformed to a linear combination of a few "atoms" in a trained dictionary. In addition, we also investigate the construction of the dictionary under the guidance of transfer learning. We collect a large set of diverse video clips of sport games and movies, from which a set of universal atoms composed of the dictionary are learned by an online learning strategy. We test our approach on KTH dataset and UCF sports dataset. Experimental results demonstrate that our approach outperforms the state-of-art techniques on KTH dataset and achieves the comparable performance on UCF sports dataset.