Integration of speech and vision using mutual information

  • Authors:
  • D. Roy

  • Affiliations:
  • Media Lab., MIT, Cambridge, MA, USA

  • Venue:
  • ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

We are developing a system which learns words from co-occurring spoken and visual input. The goal is to automatically segment continuous speech at word boundaries without a lexicon, and to form visual categories which correspond to spoken words. Mutual information is used to integrate acoustic and visual distance metrics in order to extract an audio-visual lexicon from raw input. We report results of experiments with a corpus of infant-directed speech and images.