Real-time human pose recognition in parts from single depth images

Authors:
Jamie Shotton;Toby Sharp;Alex Kipman;Andrew Fitzgibbon;Mark Finocchio;Andrew Blake;Mat Cook;Richard Moore
Affiliations:
Microsoft Research, Cambridge, UK;Microsoft Research, Cambridge, UK;Xbox Incubation;Microsoft Research, Cambridge, UK;Xbox Incubation;Microsoft Research, Cambridge, UK;Microsoft Research, Cambridge, UK;ST-Ericsson
Venue:
Communications of the ACM
Year:
2013

Citing 16
Cited 3

Shape quantization and recognition with randomized trees

Neural Computation
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Random Forests

Machine Learning
Shape Matching and Object Recognition Using Shape Contexts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Implicit Probabilistic Models of Human Motion for Synthesis and Tracking

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Pedestrian Detection from a Moving Vehicle

ECCV '00 Proceedings of the 6th European Conference on Computer Vision-Part II
Fast Pose Estimation with Parameter-Sensitive Hashing

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Randomized Trees for Real-Time Keypoint Recognition

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Design and Performance of a Fault-Tolerant Real-Time CORBA Event Service

ECRTS '06 Proceedings of the 18th Euromicro Conference on Real-Time Systems
A survey of advances in vision-based human motion capture and analysis

Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
Vision-based human motion analysis: An overview

Computer Vision and Image Understanding
Relevant Feature Selection for Human Pose Estimation and Localization in Cluttered Images

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Real-time hand-tracking with a color glove

ACM SIGGRAPH 2009 papers
Constrained optimization for human pose estimation from depth sequences

ACCV'07 Proceedings of the 8th Asian conference on Computer vision - Volume Part I
3D human pose from silhouettes by relevance vector regression

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I

Somatosensory interaction for real-time large scale roaming

Proceedings of the 12th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry
Real-time gender recognition based on 3D human body shape for human-robot interaction

Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction
3D tracking via body radio reflections

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	48.22

Visualization

Abstract

We propose a new method to quickly and accurately predict human pose---the 3D positions of body joints---from a single depth image, without depending on information from preceding frames. Our approach is strongly rooted in current object recognition strategies. By designing an intermediate representation in terms of body parts, the difficult pose estimation problem is transformed into a simpler per-pixel classification problem, for which efficient machine learning techniques exist. By using computer graphics to synthesize a very large dataset of training image pairs, one can train a classifier that estimates body part labels from test images invariant to pose, body shape, clothing, and other irrelevances. Finally, we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes. The system runs in under 5ms on the Xbox 360. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state-of-the-art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching.