Multimodal Signals: Cognitive and Algorithmic Issues
Blind Non-stationnary Sources Separation by Sparsity in a Linear Instantaneous Mixture
ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
Use of bimodal coherence to resolve spectral indeterminacy in Convolutive BSS
LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
NOLISP'09 Proceedings of the 2009 international conference on Advances in Nonlinear Speech Processing
Hi-index | 0.00 |
Looking at the speaker's face can be useful to better hear a speech signal in noisy environment and extract it from competing sources before identification. This suggests that the visual signals of speech (movements of visible articulators) could be used in speech enhancement or extraction systems. In this paper, we present a novel algorithm plugging audiovisual coherence of speech signals, estimated by statistical tools, on audio blind source separation (BSS) techniques. This algorithm is applied to the difficult and realistic case of convolutive mixtures. The algorithm mainly works in the frequency (transform) domain, where the convolutive mixture becomes an additive mixture for each frequency channel. Frequency by frequency separation is made by an audio BSS algorithm. The audio and visual informations are modeled by a newly proposed statistical model. This model is then used to solve the standard source permutation and scale factor ambiguities encountered for each frequency after the audio blind separation stage. The proposed method is shown to be efficient in the case of 2 times 2 convolutive mixtures and offers promising perspectives for extracting a particular speech source of interest from complex mixtures