Ausio-visual Segmentation and "The Cocktail Party Effect"

  • Authors:
  • Trevor Darrell;John W. Fisher, III;Paul A. Viola;William T. Freeman

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICMI '00 Proceedings of the Third International Conference on Advances in Multimodal Interfaces
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Audio-based interfaces usually suffer when noise or other acoustic sources are present in the environment. For robust audio recognition, a single source must first be isolated. Existing solutions to this problem generally require special microphone configurations, and often assume prior knowledge of the spurious sources. We have developed new algorithms for segmenting streams of audio-visual information into their constituent sources by exploiting the mutual information present between audio and visual tracks. Automatic face recognition and image motion analysis methods are used to generate visual features for a particular user; empirically these features have high mutual information with audio recorded from that user. We show how audio utterances from several speakers recorded with a single microphone can be separated into constituent streams; we also show how the method can help reduce the effect of noise in automatic speech recognition.