View-Invariant action recognition using latent kernelized structural SVM

Authors:
Xinxiao Wu;Yunde Jia
Affiliations:
Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing, P.R. China;Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing, P.R. China
Venue:
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Year:
2012

Citing 20
Cited 0

Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Actions Sketch: A Novel Action Representation

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Recognizing Human Actions in Videos Acquired by Uncalibrated Moving Cameras

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Multiclass multiple kernel learning

Proceedings of the 24th international conference on Machine learning
Actions as Space-Time Shapes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision
Training structural svms with kernels using sampled cuts

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning to Recognize Activities from the Wrong View Point

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Large margin training for hidden Markov models with partially observed states

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning structural SVMs with latent variables

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A discriminative latent model of object classes and attributes

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
View and style-independent action manifolds for human activity recognition

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
MuHAVi: A Multicamera Human Action Video Dataset for the Evaluation of Action Recognition Methods

AVSS '10 Proceedings of the 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance
View-Independent Action Recognition from Temporal Self-Similarities

IEEE Transactions on Pattern Analysis and Machine Intelligence
Making action recognition robust to occlusions and viewpoint changes

ECCV'10 Proceedings of the 11th European conference on computer vision conference on Computer vision: Part III
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Relative attributes

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper goes beyond recognizing human actions from a fixed view and focuses on action recognition from an arbitrary view. A novel learning algorithm, called latent kernelized structural SVM, is proposed for the view-invariant action recognition, which extends the kernelized structural SVM framework to include latent variables. Due to the changing and frequently unknown positions of the camera, we regard the view label of action as a latent variable and implicitly infer it during both learning and inference. Motivated by the geometric correlation between different views and semantic correlation between different action classes, we additionally propose a mid-level correlation feature which describes an action video by a set of decision values from the pre-learned classifiers of all the action classes from all the views. Each decision value captures both geometric and semantic correlations between the action video and the corresponding action class from the corresponding view. After that, we combine the low-level visual cue, mid-level correlation description, and high-level label information into a novel nonlinear kernel under the latent kernelized structural SVM framework. Extensive experiments on multi-view IXMAS and MuHAVi action datasets demonstrate that our method generally achieves higher recognition accuracy than other state-of-the-art methods.