Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Digital Image Processing (3rd Edition)
Digital Image Processing (3rd Edition)
International Journal of Computer Vision
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
A 3-dimensional sift descriptor and its application to action recognition
Proceedings of the 15th international conference on Multimedia
IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
International Journal of Computer Vision
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images
ACM Transactions on Intelligent Systems and Technology (TIST)
Action Recognition from One Example
IEEE Transactions on Pattern Analysis and Machine Intelligence
Sparse coding on local spatial-temporal volumes for human action recognition
ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part II
Unsupervised random forest indexing for fast action search
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Image-based on-road vehicle detection using cost-effective Histograms of Oriented Gradients
Journal of Visual Communication and Image Representation
Discriminative two-level feature selection for realistic human action recognition
Journal of Visual Communication and Image Representation
Hi-index | 0.00 |
Sparse coding which encodes the natural visual signal into a sparse space for visual codebook generation and feature quantization, has been successfully utilized for many image classification applications. However, it has been seldom explored for many video analysis tasks. In particular, the increased complexity in characterizing the visual patterns of diverse human actions with both the spatial and temporal variations imposes more challenges to the conventional sparse coding scheme. In this paper, we propose an enhanced sparse coding scheme through learning discriminative dictionary and optimizing the local pooling strategy. Localizing when and where a specific action happens in realistic videos is another challenging task. By utilizing the sparse coding based representations of human actions, this paper further presents a novel coarse-to-fine framework to localize the Volumes of Interest (VOIs) for the actions. Firstly, local visual features are transformed into the sparse signal domain through our enhanced sparse coding scheme. Secondly, in order to avoid exhaustive scan of entire videos for the VOI localization, we extend the Spatial Pyramid Matching into temporal domain, namely Spatial Temporal Pyramid Matching, to obtain the VOI candidates. Finally, a multi-level branch-and-bound approach is developed to refine the VOI candidates. The proposed framework is also able to avoid prohibitive computations in local similarity matching (e.g., nearest neighbors voting). Experimental results on both two popular benchmark datasets (KTH and YouTube UCF) and the widely used localization dataset (MSR) demonstrate that our approach reduces computational cost significantly while maintaining comparable classification accuracy to that of the state-of-the-art methods.