Asymmetrically boosted HMM for speech reading

Authors:
Pei Yin;Irfan Essa;James M. Rehg
Affiliations:
Georgia Institute of Technology, GVU Center, College of Computing, Atlanta, GA;Georgia Institute of Technology, GVU Center, College of Computing, Atlanta, GA;Georgia Institute of Technology, GVU Center, College of Computing, Atlanta, GA
Venue:
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Year:
2004

Citing 9
Cited 5

Fundamentals of speech recognition

Fundamentals of speech recognition
Computer graphics (2nd ed. in C): principles and practice

Computer graphics (2nd ed. in C): principles and practice
Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Multiclass learning, boosting, and error-correcting codes

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Speechreading by Man and Machine: Models, Systems, and Applications

Speechreading by Man and Machine: Models, Systems, and Applications
Robust Real-Time Face Detection

International Journal of Computer Vision
Using boosting to improve a hybrid HMM/neural network speech recognizer

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Bagging, boosting, and C4.S

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Avatar-mediated face tracking and lip reading for human computer interaction

Proceedings of the 12th annual ACM international conference on Multimedia
Recognition of sign language subwords based on boosted hidden Markov models

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
A hybrid discriminative/generative approach for modeling human activities

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Motion Retrieval with Temporal-Spatial Features Based on Ensemble Learning

ICIC '07 Proceedings of the 3rd International Conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence
Recognition and segmentation of 3-d human action using HMM and multi-class adaboost

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech reading, also known as lip reading, is aimed at extracting visual cues of lip and facial movements to aid in recognition of speech. The main hurdle for speech reading is that visual measurements of lip and facial motion lack information-rich features like the Mel frequency cepstral coefficients (MFCC), widely used in acoustic speech recognition. These MFCC are used with hidden Markov models (HMM) in most speech recognition systems at present. Speech reading could greatly benefit from automatic selection and formation of informative features from measurements in the visual domain. These new features can then be used with HMM to capture the dynamics of lip movement and eventual recognition of lip shapes. Towards this end, we use AdaBoost methods for automatic visual feature formation. Specifically, we design an asymmetric variant of AdaBoost M2 algorithm to deal with the ill-posed multi-class sample distribution inherent in our problem. Our experiments show that the boosted HMM approach outperforms conventional AdaBoost and HMM classifiers. Our primary contributions are in the design of (a) boosted HMM and (b) asymmetric multi-class boosting.