Robust Real-Time Face Detection
International Journal of Computer Vision
Visual Speech Recognition: Lip Segmentation and Mapping
Visual Speech Recognition: Lip Segmentation and Mapping
Czech Artificial Computerized Talking Head George
Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions
Online Noise Estimation Using Stochastic-Gain HMM for Speech Enhancement
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
This contribution is about experiments in audio-visual isolated words recognition. The results of these experiments will be used to improve our voice dialogue system, where visual speech recognition will be added. The voice dialogue systems can be used in train or bus stations (or elsewhere), where noise levels are relatively high, therefore the visual part of speech can improve the recognition rate mainly in noisy conditions. The audio-visual recognition of isolated words in our experiments was based on the technique of two-stream Hidden Markov Models (HMM) and on the HMM of single Czech phonemes and visemes. Different visual speech features and a different number of states and mixtures of HMM were evaluated in single tests. In the following experiments, isolated words were being recognized after training of the HMM and babble noise was added in the successive steps to the acoustic speech signal.