Neural network-based head pose estimation and multi-view fusion

Authors:
Michael Voit;Kai Nickel;Rainer Stiefelhagen
Affiliations:
Interactive Systems Lab, Universität Karlsruhe, Germany;Interactive Systems Lab, Universität Karlsruhe, Germany;Interactive Systems Lab, Universität Karlsruhe, Germany
Venue:
CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
Year:
2006

Citing 5
Cited 14

Computing 3-D head orientation from a monocular image sequence

FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)
A model-based gaze tracking system

IJSIS '96 Proceedings of the 1996 IEEE International Joint Symposia on Intelligence and Systems
Simultaneous Tracking of Head Poses in a Panoramic View

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 3
A Probabilistic Framework for Joint Head Tracking and Pose Estimation

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 4 - Volume 04
Multi-View Head Pose Estimation using Neural Networks

CRV '05 Proceedings of the 2nd Canadian conference on Computer and Robot Vision

Audiovisual head orientation estimation with particle filtering in multisensor scenarios

EURASIP Journal on Advances in Signal Processing
Head Pose Estimation in Single- and Multi-view Environments - Results on the CLEAR'07 Benchmarks

Multimodal Technologies for Perception of Humans
Head Orientation Estimation Using Particle Filtering in Multiview Scenarios

Multimodal Technologies for Perception of Humans
Unsupervised Learning of Head Pose through Spike-Timing Dependent Plasticity

PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
2009 Special Issue: Boosting feature selection for Neural Network based regression

Neural Networks
Locating nose-tips and estimating head poses in images by tensorposes

IEEE Transactions on Circuits and Systems for Video Technology
BISAR: boosted input selection algorithm for regression

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
A vision-based attentive user interface with (semi-)automatic parameter calibration

CompSysTech '09 Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing
Head pan angle estimation by a nonlinear regression on selected features

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Efficient person identification using active cameras in a smartroom

Proceedings of the 1st ACM international workshop on Multimodal pervasive video analysis
A platform for monitoring aspects of human presence in real-time

ISVC'10 Proceedings of the 6th international conference on Advances in visual computing - Volume Part II
Head pose estimation for real-time low-resolution video

Proceedings of the 28th Annual European Conference on Cognitive Ergonomics
A Two-Layer Framework for Piecewise Linear Manifold-Based Head Pose Estimation

International Journal of Computer Vision
GlueTK: a framework for multi-modal, multi-display human-machine-interaction

Proceedings of the 2013 international conference on Intelligent user interfaces

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present two systems that were used for head pose estimation during the CLEAR06 Evaluation. We participated in two tasks: (1) estimating both pan and tilt orientation on synthetic, high resolution head captures, (2) estimating horizontal head orientation only on real seminar recordings that were captured with multiple cameras from different viewing angles. In both systems, we used a neural network to estimate the persons' head orientation. In case of seminar recordings, a Bayes filter framework is further used to provide a statistical fusion scheme, integrating every camera view into one joint hypothesis. We achieved a mean error of 12.3° on horizontal head orientation estimation, in the monocular, high resolution task. Vertical orientation performed with 12.77° mean error. In case of the multi-view seminar recordings, our system could correctly identify head orientation in 34.9% (one of eight classes). If neighbouring classes were allowed, even 72.9% of the frames were correctly classified.