A two-channel training algorithm for hidden Markov model and its application to lip reading

Authors:
Liang Dong;Say Wei Foo;Yong Lian
Affiliations:
Department of Electrical and Computer Engineering, National University of Singapore, Singapore;School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore;Department of Electrical and Computer Engineering, National University of Singapore, Singapore
Venue:
EURASIP Journal on Applied Signal Processing
Year:
2005

Citing 17
Cited 3

Pattern recognition: statistical, structural and neural approaches

Pattern recognition: statistical, structural and neural approaches
Fundamentals of speech recognition

Fundamentals of speech recognition
Deformable templates

Active vision
Continuous automatic speech recognition by lipreading

Continuous automatic speech recognition by lipreading
Extraction of Visual Features for Lipreading

IEEE Transactions on Pattern Analysis and Machine Intelligence
Lip feature extraction using red exclusion

VIP '00 Selected papers from the Pan-Sydney workshop on Visualisation - Volume 2
Recognition of Visual Speech Elements Using Hidden Markov Models

PCM '02 Proceedings of the Third IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Invited Speech: "Speechreading: An Overview of Image Processing, Feature Extraction, Sensory Intergration and Pattern Recognition Techiques

FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)
Nonlinear manifold learning for visual speech recognition

ICCV '95 Proceedings of the Fifth International Conference on Computer Vision
Automatic lipreading to enhance speech recognition (speech reading)

Automatic lipreading to enhance speech recognition (speech reading)
Integrating audio and visual information to provide highly robust speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Automatic speechreading with applications to human-computer interfaces

EURASIP Journal on Applied Signal Processing
Dynamic Bayesian networks for audio-visual speech recognition

EURASIP Journal on Applied Signal Processing
Asynchrony modeling for audio-visual speech recognition

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Audio-visual speech modeling for continuous speech recognition

IEEE Transactions on Multimedia
Recognition of visual speech elements using adaptively boosted hidden Markov models

IEEE Transactions on Circuits and Systems for Video Technology
An HMM-based speech-to-video synthesizer

IEEE Transactions on Neural Networks

A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition

PSIVT '09 Proceedings of the 3rd Pacific Rim Symposium on Advances in Image and Video Technology
A new manifold representation for visual speech recognition

CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Real-time lip reading system for isolated Korean word recognition

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hidden Markov model (HMM) has been a popular mathematical approach for sequence classification such as speech recognition since 1980s. In this paper, a novel two-channel training strategy is proposed for discriminative training of HMM. For the proposed training strategy, a novel separable-distance function that measures the difference between a pair of training samples is adopted as the criterion function. The symbol emission matrix of an HMM is split into two channels: a static channel to maintain the validity of the HMM and a dynamic channel that is modified to maximize the separable distance. The parameters of the two-channel HMM are estimated by iterative application of expectation-maximization (EM) operations. As an example of the application of the novel approach, a hierarchical speaker-dependent visual speech recognition system is trained using the two-channel HMMs. Results of experiments on identifying a group of confusable visemes indicate that the proposed approach is able to increase the recognition accuracy by an average of 20% compared with the conventional HMMs that are trained with the Baum-Welch estimation.