Separation of audio-visual speech sources: a new approach exploiting the audio-visual coherence of speech stimuli

Authors:
David Sodoyer;Jean-Luc Schwartz;Laurent Girin;Jacob Klinkisch;Christian Jutten
Affiliations:
Institut de la Communication Parlée, Institut National Polytechnique de Grenoble, Université Stendhal, ICP, INPG, Grenoble Cedex, France;Institut de la Communication Parlée, Institut National Polytechnique de Grenoble, Université Stendhal, ICP, INPG, Grenoble Cedex, France;Institut de la Communication Parlée, Institut National Polytechnique de Grenoble, Université Stendhal, ICP, INPG, Grenoble Cedex, France;Institut de la Communication Parlée, Institut National Polytechnique de Grenoble, Université Stendhal, ICP, INPG, Grenoble Cedex, France;Laboratoire des Images et des Signaux, Institut National Polytechnique de Grenoble, Université Joseph Fourier, LIS, INPG, Grenoble Cedex, France
Venue:
EURASIP Journal on Applied Signal Processing
Year:
2002

Citing 7
Cited 4

Independent component analysis, a new concept?

Signal Processing - Special issue on higher order statistics
Quantitative association of vocal-tract and facial behavior

Speech Communication - Special issue on auditory-visual speech processing
High-order contrasts for independent component analysis

Neural Computation
Speechreading by Man and Machine: Models, Systems, and Applications

Speechreading by Man and Machine: Models, Systems, and Applications
Convergence Properties of the Nelder--Mead Simplex Method in Low Dimensions

SIAM Journal on Optimization
Blind separation of mixture of independent sources through aquasi-maximum likelihood approach

IEEE Transactions on Signal Processing
Equivariant adaptive source separation

IEEE Transactions on Signal Processing

Audiovisual speech synchrony measure: application to biometrics

EURASIP Journal on Applied Signal Processing
Use of bimodal coherence to resolve spectral indeterminacy in Convolutive BSS

LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
Use of bimodal coherence to resolve the permutation problem in convolutive BSS

Signal Processing
Multimodal speech separation

NOLISP'09 Proceedings of the 2009 international conference on Advances in Nonlinear Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lipreading, the objective is to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker's lip movements. We consider the case of an additive stationary mixture of decorrelated sources, with no further assumptions on independence or non-Gaussian character. Firstly, we present a theoretical framework showing that it is indeed possible to separate a source when some of its spectral characteristics are provided to the system. Then we address the case of audio-visual sources. We show how, if a statistical model of the joint probability of visual and spectral audio input is learnt to quantify the audio-visual coherence, separation can be achieved by maximizing this probability. Finally, we present a number of separation results on a corpus of vowel-plosive-vowel sequences uttered by a single speaker, embedded in a mixture of other voices. We show that separation can be quite good for mixtures of 2, 3, and 5 sources. These results, while very preliminary, are encouraging, and are discussed in respect to their potential complementarity with traditional pure audio separation or enhancement techniques.