Fast human action classification and VOI localization with enhanced sparse coding

Authors:
Shiyang Lu;Jian Zhang;Zhiyong Wang;David Dagan Feng
Affiliations:
The University of Sydney, Australia;University of Technology, Sydney, Australia;The University of Sydney, Australia;The University of Sydney, Australia
Venue:
Journal of Visual Communication and Image Representation
Year:
2013

Citing 13
Cited 2

Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Digital Image Processing (3rd Edition)

Digital Image Processing (3rd Edition)
On Space-Time Interest Points

International Journal of Computer Vision
Actions as Space-Time Shapes

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
A 3-dimensional sift descriptor and its application to action recognition

Proceedings of the 15th international conference on Multimedia
Space-Time Behavior-Based Correlation—OR—How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them?

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images

ACM Transactions on Intelligent Systems and Technology (TIST)
Action Recognition from One Example

IEEE Transactions on Pattern Analysis and Machine Intelligence
Sparse coding on local spatial-temporal volumes for human action recognition

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part II
Unsupervised random forest indexing for fast action search

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

Image-based on-road vehicle detection using cost-effective Histograms of Oriented Gradients

Journal of Visual Communication and Image Representation
Discriminative two-level feature selection for realistic human action recognition

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sparse coding which encodes the natural visual signal into a sparse space for visual codebook generation and feature quantization, has been successfully utilized for many image classification applications. However, it has been seldom explored for many video analysis tasks. In particular, the increased complexity in characterizing the visual patterns of diverse human actions with both the spatial and temporal variations imposes more challenges to the conventional sparse coding scheme. In this paper, we propose an enhanced sparse coding scheme through learning discriminative dictionary and optimizing the local pooling strategy. Localizing when and where a specific action happens in realistic videos is another challenging task. By utilizing the sparse coding based representations of human actions, this paper further presents a novel coarse-to-fine framework to localize the Volumes of Interest (VOIs) for the actions. Firstly, local visual features are transformed into the sparse signal domain through our enhanced sparse coding scheme. Secondly, in order to avoid exhaustive scan of entire videos for the VOI localization, we extend the Spatial Pyramid Matching into temporal domain, namely Spatial Temporal Pyramid Matching, to obtain the VOI candidates. Finally, a multi-level branch-and-bound approach is developed to refine the VOI candidates. The proposed framework is also able to avoid prohibitive computations in local similarity matching (e.g., nearest neighbors voting). Experimental results on both two popular benchmark datasets (KTH and YouTube UCF) and the widely used localization dataset (MSR) demonstrate that our approach reduces computational cost significantly while maintaining comparable classification accuracy to that of the state-of-the-art methods.