Fundamentals of speech recognition
Fundamentals of speech recognition
Continuous automatic speech recognition by lipreading
Continuous automatic speech recognition by lipreading
Speechreading using probabilistic models
Computer Vision and Image Understanding - Special issue on physics-based modeling and reasoning in computer vision
Making large-scale support vector machine learning practical
Advances in kernel methods
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
IEEE Transactions on Pattern Analysis and Machine Intelligence
Discrete Time Processing of Speech Signals
Discrete Time Processing of Speech Signals
Support Vector Regression and Classification Based Multi-View Face Detection and Recognition
FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
Nonlinear manifold learning for visual speech recognition
ICCV '95 Proceedings of the Fifth International Conference on Computer Vision
Invariant Face Detection with Support Vector Machines
ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
Moderating the outputs of support vector machine classifiers
IEEE Transactions on Neural Networks
Articulatory features for robust visual speech recognition
Proceedings of the 6th international conference on Multimodal interfaces
Visual lip activity detection and speaker detection using mouth region intensities
IEEE Transactions on Circuits and Systems for Video Technology
Comparison of fixed and variable weight approaches for viseme classification
SIP '07 Proceedings of the Ninth IASTED International Conference on Signal and Image Processing
The Visual Computer: International Journal of Computer Graphics
Hi-index | 0.00 |
Visual speech recognition is an emerging research field. In this paper, we examine the suitability of support vector machines for visual speech recognition. Each word is modeled as a temporal sequence of visemes corresponding to the different phones realized. One support vector machine is trained to recognize each viseme and its output is converted to a posterior probability through a sigmoidal mapping. To model the temporal character of speech, the support vector machines are integrated as nodes into a Viterbi lattice. We test the performance of the proposed approach on a small visual speech recognition task, namely the recognition of the first four digits in English. The word recognition rate obtained is at the level of the previous best reported rates.