Models for Audiovisual Fusion in a Noisy-Vowel Recognition Task

Authors:
Pascal Teissier;Anne Guerin-Dugue;Jean-Luc Schwartz
Affiliations:
Laboratoire des Images et des Signaux LIS, INPG, 46 Av. Felix-Viallet, 38031 Grenoble Cedex 1/ Institut de la Communication Parlee CNRS UPRESA 5009/INPG-U. Stendhal ICP, INPG, 46 Av. Felix-Viall ...;Laboratoire des Images et des Signaux LIS, INPG, 46 Av. Felix-Viallet, 38031 Grenoble Cedex 1;Institut de la Communication Parlee CNRS UPRESA 5009/INPG-U. Stendhal ICP, INPG, 46 Av. Felix-Viallet, 38031 Grenoble Cedex 1
Venue:
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Year:
1998

Citing 9
Cited 3

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Models for Audiovisual Fusion in a Noisy-Vowel Recognition Task

Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Decision Fusion

Decision Fusion
Speechreading by Man and Machine: Models, Systems, and Applications

Speechreading by Man and Machine: Models, Systems, and Applications
Constrained Neural Network for Estimating Sensor Reliability in Sensors Fusion

IWANN '97 Proceedings of the International Work-Conference on Artificial and Natural Neural Networks: Biological and Artificial Computation: From Neuroscience to Technology
Automatic lipreading to enhance speech recognition (speech reading)

Automatic lipreading to enhance speech recognition (speech reading)
Adaptive bimodal sensor fusion for automatic speechreading

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Information combination operators for data fusion: a comparative review with classification

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets

IEEE Transactions on Neural Networks

Models for Audiovisual Fusion in a Noisy-Vowel Recognition Task

Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Nonlinear dimensionality reduction using a temporal coherence principle

Information Sciences: an International Journal
Adaptive Reliability Measure and Optimum Integration Weight for Decision Fusion Audio-visual Speech Recognition

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a study of models for audiovisual (AV)fusion in a noisy-vowel recognition task. We progressively elaborateaudiovisual models in order to respect the major principledemonstrated by human subjects in speech perception experiments (the“synergy” principle): audiovisual identification should always bemore efficient than auditory-alone or visual-alone identification.We first recall that the efficiency of audiovisual speech recognitionsystems depends on the level at which they fuse sound and image: fourAV architectures are presented, and two are selected for thefollowing of the study. Secondly, we show the importance of providinga contextual input linked to the Signal-to-Noise Ratio (SNR) in thefusion process. Then we propose an original approach using anefficient nonlinear dimension reduction algorithm (“curvilinearcomponents analysis”) in order to increase the performances of thetwo AV architectures. Furthermore, we show that this approach allowsan easy and efficient estimation of the reliability of the audiosensor in relation to SNR, that this estimation can be used tocontrol the AV fusion process, and that it significantly improves theAV performances. Hence, altogether, nonlinear dimension reduction,context estimation and control of the fusion process enable us torespect the “synergy” criterion for the two most usedarchitectures.