Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
Models for Audiovisual Fusion in a Noisy-Vowel Recognition Task
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Decision Fusion
Speechreading by Man and Machine: Models, Systems, and Applications
Speechreading by Man and Machine: Models, Systems, and Applications
Constrained Neural Network for Estimating Sensor Reliability in Sensors Fusion
IWANN '97 Proceedings of the International Work-Conference on Artificial and Natural Neural Networks: Biological and Artificial Computation: From Neuroscience to Technology
Automatic lipreading to enhance speech recognition (speech reading)
Automatic lipreading to enhance speech recognition (speech reading)
Adaptive bimodal sensor fusion for automatic speechreading
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Information combination operators for data fusion: a comparative review with classification
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets
IEEE Transactions on Neural Networks
Models for Audiovisual Fusion in a Noisy-Vowel Recognition Task
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Nonlinear dimensionality reduction using a temporal coherence principle
Information Sciences: an International Journal
Journal of Signal Processing Systems
Hi-index | 0.00 |
This paper presents a study of models for audiovisual (AV)fusion in a noisy-vowel recognition task. We progressively elaborateaudiovisual models in order to respect the major principledemonstrated by human subjects in speech perception experiments (the“synergy” principle): audiovisual identification should always bemore efficient than auditory-alone or visual-alone identification.We first recall that the efficiency of audiovisual speech recognitionsystems depends on the level at which they fuse sound and image: fourAV architectures are presented, and two are selected for thefollowing of the study. Secondly, we show the importance of providinga contextual input linked to the Signal-to-Noise Ratio (SNR) in thefusion process. Then we propose an original approach using anefficient nonlinear dimension reduction algorithm (“curvilinearcomponents analysis”) in order to increase the performances of thetwo AV architectures. Furthermore, we show that this approach allowsan easy and efficient estimation of the reliability of the audiosensor in relation to SNR, that this estimation can be used tocontrol the AV fusion process, and that it significantly improves theAV performances. Hence, altogether, nonlinear dimension reduction,context estimation and control of the fusion process enable us torespect the “synergy” criterion for the two most usedarchitectures.