Multiple scale-specific representations for improved human action recognition

Authors:
Amir H. Shabani;John S. Zelek;David A. Clausi
Affiliations:
Vision and Image Processing (VIP) Lab, Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada N2L 3G1 and Intelligent Systems Lab, Department of Systems Design Engi ...;Intelligent Systems Lab, Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada N2L 3G1;Vision and Image Processing (VIP) Lab, Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada N2L 3G1
Venue:
Pattern Recognition Letters
Year:
2013

Citing 22
Cited 0

Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
A Review of Nonlinear Diffusion Filtering

SCALE-SPACE '97 Proceedings of the First International Conference on Scale-Space Theory in Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Multiresolution Histograms and Their Use for Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
On Space-Time Interest Points

International Journal of Computer Vision
Behavior recognition via sparse spatio-temporal features

ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
A 3-dimensional sift descriptor and its application to action recognition

Proceedings of the 15th international conference on Multimedia
Local velocity-adapted motion events for spatio-temporal recognition

Computer Vision and Image Understanding
An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Towards a Robust Spatio-Temporal Interest Point Detection for Human Action Recognition

CRV '09 Proceedings of the 2009 Canadian Conference on Computer and Robot Vision
Statistical Learning and Pattern Analysis for Image and Video Processing

Statistical Learning and Pattern Analysis for Image and Video Processing
A survey on vision-based human action recognition

Image and Vision Computing
Human Action Recognition Using Salient Opponent-Based Motion Features

CRV '10 Proceedings of the 2010 Canadian Conference on Computer and Robot Vision
Human activity analysis: A review

ACM Computing Surveys (CSUR)
Boosted multi-class semi-supervised learning for human action recognition

Pattern Recognition
Action recognition by dense trajectories

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
A Survey on Visual Content-Based Video Indexing and Retrieval

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Evaluation of Local Spatio-temporal Salient Feature Detectors for Human Action Recognition

CRV '12 Proceedings of the 2012 Ninth Conference on Computer and Robot Vision

Quantified Score

Hi-index	0.10

Visualization

Abstract

Human action recognition in video is important in many computer vision applications such as automated surveillance. Human actions can be compactly encoded using a sparse set of local spatio-temporal salient features at different scales. The existing bottom-up methods construct a single dictionary of action primitives from the joint features of all scales and hence, a single action representation. This representation cannot fully exploit the complementary characteristics of the motions across different scales. To address this problem, we introduce the concept of learning multiple dictionaries of action primitives at different resolutions and consequently, multiple scale-specific representations for a given video sample. Using a decoupled fusion of multiple representations, we improved the human classification accuracy of realistic benchmark databases by about 5%, compared with the state-of-the art methods.