Use of bimodal coherence to resolve spectral indeterminacy in Convolutive BSS

Authors:
Qingju Liu;Wenwu Wang;Philip Jackson
Affiliations:
Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, United Kingdom;Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, United Kingdom;Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, United Kingdom
Venue:
LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
Year:
2010

Citing 4
Cited 1

Blind separation of sources, Part 1: an adaptive algorithm based on neuromimetic architecture

Signal Processing
Independent component analysis, a new concept?

Signal Processing - Special issue on higher order statistics
Separation of audio-visual speech sources: a new approach exploiting the audio-visual coherence of speech stimuli

EURASIP Journal on Applied Signal Processing
Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures

IEEE Transactions on Audio, Speech, and Language Processing

Use of bimodal coherence to resolve the permutation problem in convolutive BSS

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent studies show that visual information contained in visual speech can be helpful for the performance enhancement of audio-only blind source separation (BSS) algorithms. Such information is exploited through the statistical characterisation of the coherence between the audio and visual speech using, e.g. a Gaussian mixture model (GMM). In this paper, we present two new contributions. An adapted expectation maximization (AEM) algorithm is proposed in the training process to model the audio-visual coherence upon the extracted features. The coherence is exploited to solve the permutation problem in the frequency domain using a new sorting scheme. We test our algorithm on the XM2VTS multimodal database. The experimental results show that our proposed algorithm outperforms traditional audio-only BSS.