Adaptive bimodal sensor fusion for automatic speechreading

Authors:
U. Meier;W. Hurst;P. Duchnowski
Affiliations:
Interactive Syst. Labs., Karlsruhe Univ., Germany;-;-
Venue:
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Year:
1996

Citing 0
Cited 13

Visual tracking for multimodal human computer interaction

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Models for Audiovisual Fusion in a Noisy-Vowel Recognition Task

Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Lip feature extraction using red exclusion

VIP '00 Selected papers from the Pan-Sydney workshop on Visualisation - Volume 2
Audio-visual speech recognition using red exclusion and neural networks

ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Multimodal interfaces

The human-computer interaction handbook
Sensor fusion weighting measures in Audio-Visual Speech Recognition

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Audio-video automatic speech recognition: an example of improved performance through multimodal sensor input

MMUI '05 Proceedings of the 2005 NICTA-HCSNet Multimodal User Interaction Workshop - Volume 57
Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment

Speech Communication
Robust face-voice based speaker identity verification using multilevel fusion

Image and Vision Computing
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces

HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces
Reliability score based multimodal fusion for biometric person authentication

MATH'08 Proceedings of the American Conference on Applied Mathematics
Spoken Word Recognition from Side of Face Using Infrared Lip Movement Sensor

PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
Designing the user interface for multimodal speech and pen-based gesture applications: state-of-the-art systems and future research directions

Human-Computer Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present work on improving the performance of automated speech recognizers by using additional visual information: (lip-/speechreading); achieving error reduction of up to 50%. This paper focuses on different methods of combining the visual and acoustic data to improve the recognition performance. We show this on an extension of an existing state-of-the-art speech recognition system, a modular MS-TDNN. We have developed adaptive combination methods at several levels of the recognition network. Additional information such as estimated signal-to-noise ratio (SNR) is used in some cases. The results of the different combination methods are shown for clean speech and data with artificial noise (white, music, motor). The new combination methods adapt automatically to varying noise conditions making hand-tuned parameters unnecessary.