Use of bimodal coherence to resolve spectral indeterminacy in Convolutive BSS

  • Authors:
  • Qingju Liu;Wenwu Wang;Philip Jackson

  • Affiliations:
  • Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, United Kingdom;Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, United Kingdom;Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, United Kingdom

  • Venue:
  • LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent studies show that visual information contained in visual speech can be helpful for the performance enhancement of audio-only blind source separation (BSS) algorithms. Such information is exploited through the statistical characterisation of the coherence between the audio and visual speech using, e.g. a Gaussian mixture model (GMM). In this paper, we present two new contributions. An adapted expectation maximization (AEM) algorithm is proposed in the training process to model the audio-visual coherence upon the extracted features. The coherence is exploited to solve the permutation problem in the frequency domain using a new sorting scheme. We test our algorithm on the XM2VTS multimodal database. The experimental results show that our proposed algorithm outperforms traditional audio-only BSS.