Audio-visual isolated words recognition for voice dialogue system

Authors:
Josef Chaloupka
Affiliations:
Institute of Information Technology, Technical University of Liberec, Liberec, Czech Republic
Venue:
COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
Year:
2010

Citing 4
Cited 0

Robust Real-Time Face Detection

International Journal of Computer Vision
Visual Speech Recognition: Lip Segmentation and Mapping

Visual Speech Recognition: Lip Segmentation and Mapping
Czech Artificial Computerized Talking Head George

Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions
Online Noise Estimation Using Stochastic-Gain HMM for Speech Enhancement

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This contribution is about experiments in audio-visual isolated words recognition. The results of these experiments will be used to improve our voice dialogue system, where visual speech recognition will be added. The voice dialogue systems can be used in train or bus stations (or elsewhere), where noise levels are relatively high, therefore the visual part of speech can improve the recognition rate mainly in noisy conditions. The audio-visual recognition of isolated words in our experiments was based on the technique of two-stream Hidden Markov Models (HMM) and on the HMM of single Czech phonemes and visemes. Different visual speech features and a different number of states and mixtures of HMM were evaluated in single tests. In the following experiments, isolated words were being recognized after training of the HMM and babble noise was added in the successive steps to the acoustic speech signal.