Exploring the spatial hierarchy of mixture models for human pose estimation

Authors:
Yuandong Tian;C. Lawrence Zitnick;Srinivasa G. Narasimhan
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Microsoft Research, Redmond, WA;Carnegie Mellon University, Pittsburgh, PA
Venue:
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Year:
2012

Citing 8
Cited 1

Beyond Trees: Common-Factor Models for 2D Human Pose Recovery

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
The Representation and Matching of Pictorial Structures

IEEE Transactions on Computers
A Study of Parts-Based Object Class Detection Using Complete Graphs

International Journal of Computer Vision
Improved human parsing with a full relational model

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Articulated pose estimation with flexible mixtures-of-parts

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Learning effective human pose estimation from inaccurate annotation

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Learning hierarchical poselets for human parsing

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Articulated part-based model for joint object detection and pose estimation

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Learning visual symbols for parsing human poses in images

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Human pose estimation requires a versatile yet well-constrained spatial model for grouping locally ambiguous parts together to produce a globally consistent hypothesis. Previous works either use local deformable models deviating from a certain template, or use a global mixture representation in the pose space. In this paper, we propose a new hierarchical spatial model that can capture an exponential number of poses with a compact mixture representation on each part. Using latent nodes, it can represent high-order spatial relationship among parts with exact inference. Different from recent hierarchical models that associate each latent node to a mixture of appearance templates (like HoG), we use the hierarchical structure as a pure spatial prior avoiding the large and often confounding appearance space. We verify the effectiveness of this model in three ways. First, samples representing human-like poses can be drawn from our model, showing its ability to capture high-order dependencies of parts. Second, our model achieves accurate reconstruction of unseen poses compared to a nearest neighbor pose representation. Finally, our model achieves state-of-art performance on three challenging datasets, and substantially outperforms recent hierarchical models.