Head Pose Estimation for Video-Conferencing with Multiple Cameras and Microphones

  • Authors:
  • Ce Wang;Michael Brandstein

  • Affiliations:
  • -;-

  • Venue:
  • ICMI '00 Proceedings of the Third International Conference on Advances in Multimodal Interfaces
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

An automatic video-conferencing system is proposed which employs acoustic source localization, video face tracking and pose estimation. The audio portion of the system provides the initial localization of the talkers and the video component tracks the talkers by utilizing source motion, contour geometry, color data, and simple facial features. Decisions involving which camera to use are based on an estimate of the head's orientation. This head pose estimation is achieved using a very general head model which employs hairline features and a learned network classificaion procedure powered by Support Vector Machines. The procedure is capable of accurately evaluating head orientations over a complete 360 degree interval. By relying on a facial criterion that is easily extracted from video images acquired across a range of lighting and zooming conditions, the estimator is designed to be effective in practical situtaions such as those encounted in video conferencing or surveillance scenarios.