A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition

Authors:
Dahai Yu;Ovidiu Ghita;Alistair Sutherland;Paul F. Whelan
Affiliations:
Vision Systems Group, School of Electronic Engineering and Computing, Dublin City University, Dublin, Ireland;Vision Systems Group, School of Electronic Engineering and Computing, Dublin City University, Dublin, Ireland;Vision Systems Group, School of Electronic Engineering and Computing, Dublin City University, Dublin, Ireland;Vision Systems Group, School of Electronic Engineering and Computing, Dublin City University, Dublin, Ireland
Venue:
PSIVT '09 Proceedings of the 3rd Pacific Rim Symposium on Advances in Image and Video Technology
Year:
2009

Citing 8
Cited 0

EM algorithms for PCA and SPCA

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
MPEG-4 Facial Animation: The Standard,Implementation and Applications

MPEG-4 Facial Animation: The Standard,Implementation and Applications
Classifying Visemes for Automatic Lipreading

TSD '99 Proceedings of the Second International Workshop on Text, Speech and Dialogue
Recognition of Visual Speech Elements Using Hidden Markov Models

PCM '02 Proceedings of the Third IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Automatic lipreading to enhance speech recognition (speech reading)

Automatic lipreading to enhance speech recognition (speech reading)
Visual Speech Recognition Using Image Moments and Multiresolution Wavelet Images

CGIV '06 Proceedings of the International Conference on Computer Graphics, Imaging and Visualisation
A two-channel training algorithm for hidden Markov model and its application to lip reading

EURASIP Journal on Applied Signal Processing
A new manifold representation for visual speech recognition

CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the development of a novel visual speech recognition (VSR) system based on a new representation that extends the standard viseme concept (that is referred in this paper to as Visual Speech Unit (VSU)) and Hidden Markov Models (HMM). The visemes have been regarded as the smallest visual speech elements in the visual domain and they have been widely applied to model the visual speech, but it is worth noting that they are problematic when applied to the continuous visual speech recognition. To circumvent the problems associated with standard visemes, we propose a new visual speech representation that includes not only the data associated with the articulation of the visemes but also the transitory information between consecutive visemes. To fully evaluate the appropriateness of the proposed visual speech representation, in this paper an extensive set of experiments have been conducted to analyse the performance of the visual speech units when compared with that offered by the standard MPEG-4 visemes. The experimental results indicate that the developed VSR application achieved up to 90% correct recognition when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only in the range 62-72%.