A multimodal speaker detection and tracking system for teleconferencing

  • Authors:
  • Billibon H. Yoshimi;Gopal S. Pingali

  • Affiliations:
  • IBM T. J. Watson Research Lab, Yorktown Heights, NY;IBM T. J. Watson Research Lab, Hawthorne, NY

  • Venue:
  • Proceedings of the tenth ACM international conference on Multimedia
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

A serious problem in both audio and video conferencing facilities available today is the difficulty in determining who is speaking among a large number of participants. There is a strong need for developing meeting room infrastructure and teleconference facilities that improve the sense of presence and participation experienced in remote meetings. We present a distributed multimodal tracking system that uses multiple cameras and microphones to automatically select the current speaker among multiple meeting participants. The system actively obtains and transmits video showing a good view of the selected speaker. The tracking system is integrated into a web-based video conferencing application that connects seven meeting rooms around the globe. An important part of designing such a system is to determine sensor placement and configuration through systematic experiments in the actual rooms where the system is deployed.