EM detection of common origin of multi-modal cues

  • Authors:
  • A. K. Noulas;B. J. A. Kröse

  • Affiliations:
  • University Of Amsterdam and Intelligent Systems Lab Amsterdam;University Of Amsterdam and Intelligent Systems Lab Amsterdam

  • Venue:
  • Proceedings of the 8th international conference on Multimodal interfaces
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Content analysis of clips containing people speaking involves processing informative cues coming from different modalities. These cues are usually the words extracted from the audio modality, and the identity of the persons appearing in the video modality of the clip. To achieve efficient assignment of these cues to the person that created them, we propose a Bayesian network model that utilizes the extracted feature characteristics, their relations and their temporal patterns. We use the EM algorithm in which the E-step estimates the expectation of the complete-data log-likelihood with respect to the hidden variables - that is the identity of the speakers and the visible persons. In the M-step , the person models that maximize this expectation are computed. This framework produces excellent results, exhibiting exceptional robustness when dealing with low quality data.