Tracking Head Yaw by Interpolation of Template Responses

  • Authors:
  • Mario Romero;Aaron Bobick

  • Affiliations:
  • Georgia Institute of Technology, Atlanta;Georgia Institute of Technology, Atlanta

  • Venue:
  • CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 5 - Volume 05
  • Year:
  • 2004
  • Gamer's facial cloning for online interactive games

    International Journal of Computer Games Technology - Special issue on cyber games and interactive entertainment

  • Gaze direction detection from thermal camera image

    IMMURO'12 Proceedings of the 11th WSEAS international conference on Instrumentation, Measurement, Circuits and Systems, and Proceedings of the 12th WSEAS international conference on Robotics, Control and Manufacturing Technology, and Proceedings of the 12th WSEAS international conference on Multimedia Systems & Signal Processing

  • Multi-constraints face detect-track system

    CISIM'12 Proceedings of the 11th IFIP TC 8 international conference on Computer Information Systems and Industrial Management

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose an appearance based machine learning architecture that estimates and tracks in real time large range head yaw given a single non-calibrated monocular grayscale low resolution image sequence of the head. The architecture is composed of five parallel template detectors, a Radial Basis Function Network and two Kalman filters. The template detectors are five view-specific images of the head ranging across full profiles in discrete steps of 45 degrees. The Radial Basis Function Network interpolates the response vector from the normalized correlation of the input image and the 5 template detectors. The first Kalman filter models the position and velocity of the response vector in five dimensional space. The second is a running average that filters the scalar output of the network. We assume the head image has been closely detected and segmented, that it undergoes only limited roll and pitch and that there are no sharp contrasts in illumination. The architecture is person-independent and is robust to changes in appearance, gesture and global illumination. The goals of this paper are, one, to measure the performance of the architecture, two, to asses the impact the temporal information gained from video has on accuracy and stability and three, to determine the effects of relaxing our assumptions.