Audio-video array source separation for perceptual user interfaces

Authors:
Kevin Wilson;Neal Checka;David Demirdjian;Trevor Darrell
Affiliations:
MIT Artificial Intelligence Lab, Cambridge, MA;MIT Artificial Intelligence Lab, Cambridge, MA;MIT Artificial Intelligence Lab, Cambridge, MA;MIT Artificial Intelligence Lab, Cambridge, MA
Venue:
Proceedings of the 2001 workshop on Perceptive user interfaces
Year:
2001

Citing 5
Cited 2

Integrated Person Tracking Using Stereo, Color, and Pattern Detection

International Journal of Computer Vision - Special issue on a special section on visual surveillance
Fast Lighting Independent Background Subtraction

International Journal of Computer Vision - Special issue on a special section on visual surveillance
LISTEN: A System for Locating and Tracking Individual Speakers

FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)
Multi-Camera Multi-Person Tracking for EasyLiving

VS '00 Proceedings of the Third IEEE International Workshop on Visual Surveillance (VS'2000)
Active speech source localization by a dual coarse-to-fine search

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05

A Graphical Model for Audiovisual Object Tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence
Omnidirectional Audio-Visual Talker Localization Based on Dynamic Fusion of Audio-Visual Features Using Validity and Reliability Criteria

IEICE - Transactions on Information and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Steerable microphone arrays provide a flexible infrastructure for audio source separation. In order for them to be used effectively in perceptual user interfaces, there must be a mechanism in place for steering the focus of the array to the sound source. Audio-only steering techniques often perform poorly in the presence of multiple sound sources or strong reverberation. Video-only techniques can achieve high spatial precision but require that the audio and video subsystems be accurately calibrated to preserve this precision. We present an audio-video localization technique that combines the benefits of the two modalities. We implement our technique in a test environment containing multiple stereo cameras and a room-sized microphone array. Our technique achieves an 8.9 dB improvement over a single far-field microphone and a 6.7 dB improvement over source separation based on video-only localization.