Audio-visual tracking for natural interactivity

  • Authors:
  • Gopal Pingali;Gamze Tunali;Ingrid Carlbom

  • Affiliations:
  • Bell Laboratories, Lucent Technologies, 600 Mountain Avenue, Murray Hill, NJ;Bell Laboratories, Lucent Technologies, 600 Mountain Avenue, Murray Hill, NJ;Bell Laboratories, Lucent Technologies, 600 Mountain Avenue, Murray Hill, NJ

  • Venue:
  • MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal in user interfaces is natural interactivity unencumbered by sensor and display technology. In this paper, we propose that a multi-modal approach using inverse modeling techniques from computer vision, speech recognition, and acoustics can result in such interfaces. In particular, we demonstrate a system for audio-visual tracking, showing that such a system is more robust, more accurate, more compact, and yields more information than using a single modality for tracking. We also demonstrate how such a system can be used to find the talker among a group of individuals, and render 3D scenes to the user.