Blind Non-stationnary Sources Separation by Sparsity in a Linear Instantaneous Mixture
ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
Voice activity detection using audio-visual information
DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
An improvement in audio-visual voice activity detection for automatic speech recognition
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
NOLISP'09 Proceedings of the 2009 international conference on Advances in Nonlinear Speech Processing
Robust visual speakingness detection using bi-level HMM
Pattern Recognition
Hi-index | 0.00 |
Audio-visual speech source separation consists in mixing visual speech processing techniques (e.g., lip parameters tracking) with source separation methods to improve the extraction of a speech source of interest from a mixture of acoustic signals. In this paper, we present a new approach that combines visual information with separation methods based on the sparseness of speech: visual information is used as a voice activity detector (VAD) which is combined with a new geometric method of separation. The proposed audio-visual method is shown to be efficient to extract a real spontaneous speech utterance in the difficult case of convolutive mixtures even if the competing sources are highly non-stationary. Typical gains of 18-20dB in signal to interference ratios are obtained for a wide range of (2x2) and (3x3) mixtures. Moreover, the overall process is computationally quite simpler than previously proposed audio-visual separation schemes.