Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Support vector machine learning for interdependent and structured output spaces
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Actions Sketch: A Novel Action Representation
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Recognizing Human Actions in Videos Acquired by Uncalibrated Moving Cameras
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Multiclass multiple kernel learning
Proceedings of the 24th international conference on Machine learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
International Journal of Computer Vision
Training structural svms with kernels using sampled cuts
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning to Recognize Activities from the Wrong View Point
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Large margin training for hidden Markov models with partially observed states
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning structural SVMs with latent variables
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A discriminative latent model of object classes and attributes
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
View and style-independent action manifolds for human activity recognition
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
MuHAVi: A Multicamera Human Action Video Dataset for the Evaluation of Action Recognition Methods
AVSS '10 Proceedings of the 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance
View-Independent Action Recognition from Temporal Self-Similarities
IEEE Transactions on Pattern Analysis and Machine Intelligence
Making action recognition robust to occlusions and viewpoint changes
ECCV'10 Proceedings of the 11th European conference on computer vision conference on Computer vision: Part III
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Hi-index | 0.00 |
This paper goes beyond recognizing human actions from a fixed view and focuses on action recognition from an arbitrary view. A novel learning algorithm, called latent kernelized structural SVM, is proposed for the view-invariant action recognition, which extends the kernelized structural SVM framework to include latent variables. Due to the changing and frequently unknown positions of the camera, we regard the view label of action as a latent variable and implicitly infer it during both learning and inference. Motivated by the geometric correlation between different views and semantic correlation between different action classes, we additionally propose a mid-level correlation feature which describes an action video by a set of decision values from the pre-learned classifiers of all the action classes from all the views. Each decision value captures both geometric and semantic correlations between the action video and the corresponding action class from the corresponding view. After that, we combine the low-level visual cue, mid-level correlation description, and high-level label information into a novel nonlinear kernel under the latent kernelized structural SVM framework. Extensive experiments on multi-view IXMAS and MuHAVi action datasets demonstrate that our method generally achieves higher recognition accuracy than other state-of-the-art methods.