Maximum Likelihood Weighting of Dynamic Speech Features for CDHMM Speech Recognition
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Audio-Visual Interaction in Multimedia Communication
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Integrating audio and visual information to provide highly robust speech recognition
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Cross-modal prediction in audio-visual communication
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 04
Multi-Modal Temporal Asynchronicity Modeling by Product HMMs for Robust
ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Audio-visual person authentication using lip-motion from orientation maps
Pattern Recognition Letters
Asynchrony modeling for audio-visual speech recognition
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition
IEEE Transactions on Computers
Data driven approaches to speech and language processing
Nonlinear Speech Modeling and Applications
An audio-visual imposture scenario by talking face animation
Nonlinear Speech Modeling and Applications
Hi-index | 0.00 |
This paper describes the integration of audio and visual speech information for robust adaptive speech processing. Since both audio speech signals and visual face configurations are produced by the human speech organs, these two types of information are strongly correlated and sometimes complement each other. This paper describes two applications based on the relationship between the two types of information, that is, bimodal speech recognition robust to acoustic noise that integrates audio-visual information, and speaking face synthesis based on the correlation between audio and visual speech.