Keyword Spotting in Poorly Printed Documents using Pseudo 2-D Hidden Markov Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection
ECCV '96 Proceedings of the 4th European Conference on Computer Vision-Volume I - Volume I
Coupled hidden Markov models for complex action recognition
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Image classification by a two-dimensional hidden Markov model
IEEE Transactions on Signal Processing
Audio-visual speech modeling for continuous speech recognition
IEEE Transactions on Multimedia
A review of speech-based bimodal recognition
IEEE Transactions on Multimedia
Matching pursuit filters applied to face identification
IEEE Transactions on Image Processing
Face recognition: a convolutional neural-network approach
IEEE Transactions on Neural Networks
A new lip feature representation method for video-based bimodal authentication
MMUI '05 Proceedings of the 2005 NICTA-HCSNet Multimodal User Interaction Workshop - Volume 57
Audio-visual speaker verification using continuous fused HMMs
VisHCI '06 Proceedings of the HCSNet workshop on Use of vision in human-computer interaction - Volume 56
Multi-stream Fusion for Speaker Classification
Speaker Classification I
Dynamic visual features for audio-visual speaker verification
Computer Speech and Language
Visual processing-inspired fern-audio features for noise-robust speaker verification
Proceedings of the 2010 ACM Symposium on Applied Computing
An information acquiring channel —— lip movement
ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
Multi-level fusion of audio and visual features for speaker identification
ICB'06 Proceedings of the 2006 international conference on Advances in Biometrics
Hi-index | 0.00 |
In this paper we describe a text dependent audio-visual speaker identification approach that combines face recognition and audio-visual speech-based identification systems. The temporal sequence of audio and visual observations obtained from the acoustic speech and the shape of the mouth are modeled using a set of coupled hidden Markov models (CHMM), one for each phoneme-viseme pair and for each person in the database. The use of CHMM in our system is justified by the capability of this model to describe the natural audio and visual state asynchrony as well as their conditional dependence over time. Next, the likelihood obtained for each person in the database is combined with the face recognition likelihood obtained using an embedded hidden Markov model (EHMM). Experimental results on XM2VTS database show that our system improves the accuracy of the audio-only or video-only speaker identification at all levels of acoustic signal-to-noise ratio (SNR) from 5 to 30db.