Learning effective human pose estimation from inaccurate annotation

Authors:
S. Johnson;M. Everingham
Affiliations:
Sch. of Comput., Univ. of Leeds, Leeds, UK;Sch. of Comput., Univ. of Leeds, Leeds, UK
Venue:
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Year:
2011

Citing 0
Cited 6

Object detection using strongly-supervised deformable part models

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Exploring the spatial hierarchy of mixture models for human pose estimation

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
People watching: human actions as a cue for single view geometry

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Appearance sharing for collective human pose estimation

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
Robust human body segmentation based on part appearance and spatial constraint

Neurocomputing
Learning visual symbols for parsing human poses in images

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The task of 2-D articulated human pose estimation in natural images is extremely challenging due to the high level of variation in human appearance. These variations arise from different clothing, anatomy, imaging conditions and the large number of poses it is possible for a human body to take. Recent work has shown state-of-the-art results by partitioning the pose space and using strong nonlinear classifiers such that the pose dependence and multi-modal nature of body part appearance can be captured. We propose to extend these methods to handle much larger quantities of training data, an order of magnitude larger than current datasets, and show how to utilize Amazon Mechanical Turk and a latent annotation update scheme to achieve high quality annotations at low cost. We demonstrate a significant increase in pose estimation accuracy, while simultaneously reducing computational expense by a factor of 10, and contribute a dataset of 10,000 highly articulated poses.