Latent Pose Estimator for Continuous Action Recognition

Authors:
Huazhong Ning;Wei Xu;Yihong Gong;Thomas Huang
Affiliations:
ECE, U. of Illinois at Urbana-Champaign, USA;NEC Laboratories America, Inc., USA;NEC Laboratories America, Inc., USA;ECE, U. of Illinois at Urbana-Champaign, USA
Venue:
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Year:
2008

Citing 0
Cited 8

Understanding video events: a survey of methods for automatic interpretation of semantic occurrences in video

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Group Action Recognition Using Space-Time Interest Points

ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part II
A survey on vision-based human action recognition

Image and Vision Computing
Advances in view-invariant human motion analysis: a review

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Human attributes from 3D pose tracking

ECCV'10 Proceedings of the 11th European conference on computer vision conference on Computer vision: Part III
Human attributes from 3D pose tracking

Computer Vision and Image Understanding
Human action recognition based on skeleton splitting

Expert Systems with Applications: An International Journal
Charting-based subspace learning for video-based human action classification

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, models based on conditional random fields (CRF) have produced promising results on labeling sequential data in several scientific fields. However, in the vision task of continuous action recognition, the observations of visual features have dimensions as high as hundreds or even thousands. This might pose severe difficulties on parameter estimation and even degrade the performance. To bridge the gap between the high dimensional observations and the random fields, we propose a novel model that replace the observation layer of a traditional random fields model with a latent pose estimator. In training stage, the human pose is not observed in the action data, and the latent pose estimator is learned under the supervision of the labeled action data, instead of image-to-pose data. The advantage of this model is twofold. First, it learns to convert the high dimensional observations into more compact and informative representations. Second, it enables transfer learning to fully utilize the existing knowledge and data on image-to-pose relationship. The parameters of the latent pose estimator and the random fields are jointly optimized through a gradient ascent algorithm. Our approach is tested on HumanEva [1] --- a publicly available dataset. The experiments show that our approach can improve recognition accuracy over standard CRF model and its variations. The performance can be further significantly improved by using additional image-to-pose data for training. Our experiments also show that the model trained on HumanEva can generalize to different environment and human subjects.