Editors Choice Article: Structured learning of local features for human action classification and localization

Authors:
Tuan Hue Thi;Li Cheng;Jian Zhang;Li Wang;Shinichi Satoh
Affiliations:
National ICT of Australia, Australia and School of Computer Science, University of New South Wales, Australia;Bioinformatics Institute, A*STAR, Singapore;National ICT of Australia, Australia and Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia;Nanjing Forest Univeristy, China;National Institute of Informatics, Japan
Venue:
Image and Vision Computing
Year:
2012

Citing 30
Cited 0

Support-Vector Networks

Machine Learning
The Recognition of Human Movement Using Temporal Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Sparse Bayesian Learning for Regression and Classification using Markov Chain Monte Carlo

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Kernel Principal Component Analysis

ICANN '97 Proceedings of the 7th International Conference on Artificial Neural Networks
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Recognizing Action at a Distance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Space-Time Behavior Based Correlation

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
On Space-Time Interest Points

International Journal of Computer Vision
Actions as Space-Time Shapes

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Successive Convex Matching for Action Detection

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Model-Based Hand Tracking Using a Hierarchical Bayesian Filter

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Tracking People by Learning Their Appearance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Free viewpoint action recognition using motion history volumes

Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
Human action recognition using shape and CLG-motion flow from multi-view image sequences

Pattern Recognition
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision
Activity representation using 3D shape models

Journal on Image and Video Processing - Anthropocentric Video Analysis: Tools and Applications
3D shape-encoded particle filter for object tracking and its application to human body tracking

Journal on Image and Video Processing - Anthropocentric Video Analysis: Tools and Applications
Monocular 3D tracking of articulated human motion in silhouette and pose manifolds

Journal on Image and Video Processing - Anthropocentric Video Analysis: Tools and Applications
An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Human Action Recognition by Semilatent Topic Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Tracklet descriptors for action modeling and video analysis

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Human Action Recognition and Localization in Video Using Structured Learning of Local Space-Time Features

AVSS '10 Proceedings of the 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance
Weakly Supervised Action Recognition Using Implicit Shape Models

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Spatiotemporal salient points for visual recognition of human actions

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Spatiotemporal Localization and Categorization of Human Actions in Unsegmented Image Sequences

IEEE Transactions on Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Human action recognition is a promising yet non-trivial computer vision field with many potential applications. Current advances in bag-of-feature approaches have brought significant insights into recognizing human actions within complex context. It is, however, a common practice in literature to consider action as merely an orderless set of local salient features. This representation has been shown to be oversimplified, which inherently limits traditional approaches from robust deployment in real-life scenarios. In this work, we propose and show that, by taking into account global configuration of local features, we can greatly improve recognition performance. We first introduce a novel feature selection process called Sparse Hierarchical Bayes Filter to select only the most contributive features of each action type based on neighboring structure constraints. We then present the application of structured learning in human action analysis. That is, by representing human action as a complex set of local features, we can incorporate different spatial and temporal feature constraints into the learning tasks of human action classification and localization. In particular, we tackle the problem of action localization in video using structured learning with two alternatives: one is Dynamic Conditional Random Field from probabilistic perspective; the other is Structural Support Vector Machine from max-margin point of view. We evaluate our modular classification-localization framework on various testbeds, in which our proposed framework is proven to be highly effective and robust compared against bag-of-feature methods.