Boosted Audio-Visual HMM for Speech Reading

Authors:
Pei Yin;Irfan Essa;James M. Rehg
Affiliations:
-;-;-
Venue:
AMFG '03 Proceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures
Year:
2003

Citing 8
Cited 1

Computer graphics: principles and practice (2nd ed.)

Computer graphics: principles and practice (2nd ed.)
Fundamentals of speech recognition

Fundamentals of speech recognition
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
The FERET Evaluation Methodology for Face-Recognition Algorithms

IEEE Transactions on Pattern Analysis and Machine Intelligence
Speechreading by Man and Machine: Models, Systems, and Applications

Speechreading by Man and Machine: Models, Systems, and Applications
Boosting and Structure Learning in Dynamic Bayesian Networks for Audio-Visual Speaker Detection

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 3 - Volume 3
Using boosting to improve a hybrid HMM/neural network speech recognizer

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Bagging, boosting, and C4.S

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Comparison of fixed and variable weight approaches for viseme classification

SIP '07 Proceedings of the Ninth IASTED International Conference on Signal and Image Processing

Quantified Score

Hi-index	0.02

Visualization

Abstract

We propose a new approach for combining acoustic andvisual measurements to aid in recognizing lip shapes of aperson speaking. Our method relies on computing the maximumlikelihoods of (a) HMM used to model phonemes fromthe acoustic signal, and (b) HMM used to model visual featuresmotions from video. One significant addition in thiswork is the dynamic analysis with features selected by Ad-aBoost,on the basis of their discriminant ability. This formof integration, leading to boosted HMM, permits AdaBoostto find the best features first, and then uses HMM to exploitdynamic information inherent in the signal.