Neural network-based head pose estimation and multi-view fusion

  • Authors:
  • Michael Voit;Kai Nickel;Rainer Stiefelhagen

  • Affiliations:
  • Interactive Systems Lab, Universität Karlsruhe, Germany;Interactive Systems Lab, Universität Karlsruhe, Germany;Interactive Systems Lab, Universität Karlsruhe, Germany

  • Venue:
  • CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present two systems that were used for head pose estimation during the CLEAR06 Evaluation. We participated in two tasks: (1) estimating both pan and tilt orientation on synthetic, high resolution head captures, (2) estimating horizontal head orientation only on real seminar recordings that were captured with multiple cameras from different viewing angles. In both systems, we used a neural network to estimate the persons' head orientation. In case of seminar recordings, a Bayes filter framework is further used to provide a statistical fusion scheme, integrating every camera view into one joint hypothesis. We achieved a mean error of 12.3° on horizontal head orientation estimation, in the monocular, high resolution task. Vertical orientation performed with 12.77° mean error. In case of the multi-view seminar recordings, our system could correctly identify head orientation in 34.9% (one of eight classes). If neighbouring classes were allowed, even 72.9% of the frames were correctly classified.