Fusion of Audio-Visual Information for Integrated Speech Processing

Authors:
Satoshi Nakamura
Affiliations:
-
Venue:
AVBPA '01 Proceedings of the Third International Conference on Audio- and Video-Based Biometric Person Authentication
Year:
2001

Citing 4
Cited 6

Maximum Likelihood Weighting of Dynamic Speech Features for CDHMM Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Audio-Visual Interaction in Multimedia Communication

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Integrating audio and visual information to provide highly robust speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Cross-modal prediction in audio-visual communication

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 04

Multi-Modal Temporal Asynchronicity Modeling by Product HMMs for Robust

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Audio-visual person authentication using lip-motion from orientation maps

Pattern Recognition Letters
Asynchrony modeling for audio-visual speech recognition

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition

IEEE Transactions on Computers
Data driven approaches to speech and language processing

Nonlinear Speech Modeling and Applications
An audio-visual imposture scenario by talking face animation

Nonlinear Speech Modeling and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the integration of audio and visual speech information for robust adaptive speech processing. Since both audio speech signals and visual face configurations are produced by the human speech organs, these two types of information are strongly correlated and sometimes complement each other. This paper describes two applications based on the relationship between the two types of information, that is, bimodal speech recognition robust to acoustic noise that integrates audio-visual information, and speaking face synthesis based on the correlation between audio and visual speech.