Audio-video array source separation for perceptual user interfaces

  • Authors:
  • Kevin Wilson;Neal Checka;David Demirdjian;Trevor Darrell

  • Affiliations:
  • MIT Artificial Intelligence Lab, Cambridge, MA;MIT Artificial Intelligence Lab, Cambridge, MA;MIT Artificial Intelligence Lab, Cambridge, MA;MIT Artificial Intelligence Lab, Cambridge, MA

  • Venue:
  • Proceedings of the 2001 workshop on Perceptive user interfaces
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Steerable microphone arrays provide a flexible infrastructure for audio source separation. In order for them to be used effectively in perceptual user interfaces, there must be a mechanism in place for steering the focus of the array to the sound source. Audio-only steering techniques often perform poorly in the presence of multiple sound sources or strong reverberation. Video-only techniques can achieve high spatial precision but require that the audio and video subsystems be accurately calibrated to preserve this precision. We present an audio-video localization technique that combines the benefits of the two modalities. We implement our technique in a test environment containing multiple stereo cameras and a room-sized microphone array. Our technique achieves an 8.9 dB improvement over a single far-field microphone and a 6.7 dB improvement over source separation based on video-only localization.