Real-time upper body detection and 3d pose estimation in monoscopic images

Authors:
Antonio S. Micilotta;Eng-Jon Ong;Richard Bowden
Affiliations:
Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, Surrey, United Kingdom;Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, Surrey, United Kingdom;Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, Surrey, United Kingdom
Venue:
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part III
Year:
2006

Citing 6
Cited 7

Neural Network-Based Face Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Communications of the ACM
Example-Based Object Detection in Images by Components

IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic Methods for Finding People

International Journal of Computer Vision
Learning to Parse Pictures of People

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Parametric correspondence and chamfer matching: two new techniques for image matching

IJCAI'77 Proceedings of the 5th international joint conference on Artificial intelligence - Volume 2

A survey of advances in vision-based human motion capture and analysis

Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
Vision-based human motion analysis: An overview

Computer Vision and Image Understanding
Real-time 3d arm pose estimation from monocular video for enhanced HCI

VNBA '08 Proceedings of the 1st ACM workshop on Vision networks for behavior analysis
Real-time and markerless 3D human motion capture using multiple views

Proceedings of the 2nd conference on Human motion: understanding, modeling, capture and animation
Locating human hands for real-time pose estimation from monocular video

Proceedings of the 17th ACM Symposium on Virtual Reality Software and Technology
Spatiotemporal analysis of human activities for biometric authentication

Computer Vision and Image Understanding
Recognizing object manipulation activities using depth and visual cues

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel solution to the difficult task of both detecting and estimating the 3D pose of humans in monoscopic images. The approach consists of two parts. Firstly the location of a human is identified by a probabalistic assembly of detected body parts. Detectors for the face, torso and hands are learnt using adaBoost. A pose likliehood is then obtained using an a priori mixture model on body configuration and possible configurations assembled from available evidence using RANSAC. Once a human has been detected, the location is used to initialise a matching algorithm which matches the silhouette and edge map of a subject with a 3D model. This is done efficiently using chamfer matching, integral images and pose estimation from the initial detection stage. We demonstrate the application of the approach to large, cluttered natural images and at near framerate operation (16fps) on lower resolution video streams.