Extraction of Visual Features for Lipreading
IEEE Transactions on Pattern Analysis and Machine Intelligence
Designing robust multimodal systems for universal access
WUAUC'01 Proceedings of the 2001 EC/NSF workshop on Universal accessibility of ubiquitous computing: providing for the elderly
Fusion of Audio-Visual Information for Integrated Speech Processing
AVBPA '01 Proceedings of the Third International Conference on Audio- and Video-Based Biometric Person Authentication
The human-computer interaction handbook
Advances in the robust processing of multimodal speech and pen systems
Multimodal interface for human-machine communication
Multi-Modal Temporal Asynchronicity Modeling by Product HMMs for Robust
ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Advances in Robust Multimodal Interface Design
IEEE Computer Graphics and Applications
Dynamic Bayesian networks for audio-visual speech recognition
EURASIP Journal on Applied Signal Processing
A two-channel training algorithm for hidden Markov model and its application to lip reading
EURASIP Journal on Applied Signal Processing
Audio-visual speech recognition using lip information extracted from side-face images
EURASIP Journal on Audio, Speech, and Music Processing
Asynchrony modeling for audio-visual speech recognition
HLT '02 Proceedings of the second international conference on Human Language Technology Research
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces
Audio-visual speech recognition based on AAM parameter and phoneme analysis of visual feature
PSIVT'11 Proceedings of the 5th Pacific Rim conference on Advances in Image and Video Technology - Volume Part I
Lipreading procedure for liveness verification in video authentication systems
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
Robust AAM-based audio-visual speech recognition against face direction changes
Proceedings of the 20th ACM international conference on Multimedia
Multiple cameras for audio-visual speech recognition in an automotive environment
Computer Speech and Language
Hi-index | 0.00 |
There is a requirement in many human machine interactions to provide accurate automatic speech recognition in the presence of high levels of interfering noise. The the paper shows that performance improvements in recognition accuracy can be obtained by including data derived from a speaker's lip images. We describe the combination of the audio and visual data in the construction of composite feature vectors and a hidden Markov model structure which allows for asynchrony between the audio and visual components. These ideas are applied to a speaker dependent recognition task involving a small vocabulary and subject to interfering noise. The recognition results obtained using composite vectors and cross-product models are compared with those based on an audio-only feature vector. The benefit of this approach is shown to be an increased performance over a very wide range of noise levels.