Use of bimodal coherence to resolve the permutation problem in convolutive BSS

Authors:
Qingju Liu;Wenwu Wang;Philip Jackson
Affiliations:
Centre for Vision, Speech and Signal Processing, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, GU2 7XH, United Kingdom;Centre for Vision, Speech and Signal Processing, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, GU2 7XH, United Kingdom;Centre for Vision, Speech and Signal Processing, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, GU2 7XH, United Kingdom
Venue:
Signal Processing
Year:
2012

Citing 15
Cited 0

Blind separation of sources, Part 1: an adaptive algorithm based on neuromimetic architecture

Signal Processing
Independent component analysis, a new concept?

Signal Processing - Special issue on higher order statistics
An information-maximization approach to blind separation and blind deconvolution

Neural Computation
A fast fixed-point algorithm for independent component analysis

Neural Computation
Separation of audio-visual speech sources: a new approach exploiting the audio-visual coherence of speech stimuli

EURASIP Journal on Applied Signal Processing
A binaural room impulse response database for the evaluation of dereverberation algorithms

DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
Use of bimodal coherence to resolve spectral indeterminacy in Convolutive BSS

LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
A blind source separation technique using second-order statistics

IEEE Transactions on Signal Processing
Matching pursuits with time-frequency dictionaries

IEEE Transactions on Signal Processing
An Approach for Solving the Permutation Problem of Convolutive Blind Source Separation Based on Statistical Signal Models

IEEE Transactions on Audio, Speech, and Language Processing
Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures

IEEE Transactions on Audio, Speech, and Language Processing
Blind Source Separation Based on Time-Domain Optimization of a Frequency-Domain Independence Criterion

IEEE Transactions on Audio, Speech, and Language Processing
Blind Audiovisual Source Separation Based on Sparse Redundant Representations

IEEE Transactions on Multimedia
Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates

IEEE Transactions on Neural Networks
Learning Bimodal Structure in Audio–Visual Data

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.08

Visualization

Abstract

Recent studies show that facial information contained in visual speech can be helpful for the performance enhancement of audio-only blind source separation (BSS) algorithms. Such information is exploited through the statistical characterization of the coherence between the audio and visual speech using, e.g., a Gaussian mixture model (GMM). In this paper, we present three contributions. With the synchronized features, we propose an adapted expectation maximization (AEM) algorithm to model the audio-visual coherence in the off-line training process. To improve the accuracy of this coherence model, we use a frame selection scheme to discard nonstationary features. Then with the coherence maximization technique, we develop a new sorting method to solve the permutation problem in the frequency domain. We test our algorithm on a multimodal speech database composed of different combinations of vowels and consonants. The experimental results show that our proposed algorithm outperforms traditional audio-only BSS, which confirms the benefit of using visual speech to assist in separation of the audio.