Tracking Head Yaw by Interpolation of Template Responses

Authors:
Mario Romero;Aaron Bobick
Affiliations:
Georgia Institute of Technology, Atlanta;Georgia Institute of Technology, Atlanta
Venue:
CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 5 - Volume 05
Year:
2004

Citing 0
Cited 3

Gamer's facial cloning for online interactive games

International Journal of Computer Games Technology - Special issue on cyber games and interactive entertainment
Gaze direction detection from thermal camera image

IMMURO'12 Proceedings of the 11th WSEAS international conference on Instrumentation, Measurement, Circuits and Systems, and Proceedings of the 12th WSEAS international conference on Robotics, Control and Manufacturing Technology, and Proceedings of the 12th WSEAS international conference on Multimedia Systems & Signal Processing
Multi-constraints face detect-track system

CISIM'12 Proceedings of the 11th IFIP TC 8 international conference on Computer Information Systems and Industrial Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose an appearance based machine learning architecture that estimates and tracks in real time large range head yaw given a single non-calibrated monocular grayscale low resolution image sequence of the head. The architecture is composed of five parallel template detectors, a Radial Basis Function Network and two Kalman filters. The template detectors are five view-specific images of the head ranging across full profiles in discrete steps of 45 degrees. The Radial Basis Function Network interpolates the response vector from the normalized correlation of the input image and the 5 template detectors. The first Kalman filter models the position and velocity of the response vector in five dimensional space. The second is a running average that filters the scalar output of the network. We assume the head image has been closely detected and segmented, that it undergoes only limited roll and pitch and that there are no sharp contrasts in illumination. The architecture is person-independent and is robust to changes in appearance, gesture and global illumination. The goals of this paper are, one, to measure the performance of the architecture, two, to asses the impact the temporal information gained from video has on accuracy and stability and three, to determine the effects of relaxing our assumptions.