Bayesian Learning for Neural Networks
Bayesian Learning for Neural Networks
Data Fusion for Sensory Information Processing Systems
Data Fusion for Sensory Information Processing Systems
Lip feature extraction using red exclusion
VIP '00 Selected papers from the Pan-Sydney workshop on Visualisation - Volume 2
Audio-visual speech recognition using red exclusion and neural networks
ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Sensor fusion weighting measures in Audio-Visual Speech Recognition
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Hi-index | 0.00 |
This paper analyzes the issue of catastrophic fusion, a problem thatoccurs in multimodal recognition systems that integrate the output from several modules while working in non-stationary environments. Forconcreteness we frame the analysis with regard to the problem of automaticaudio visual speech recognition (AVSR), but the issues at hand are verygeneral and arise in multimodal recognition systems which need to work in awide variety of contexts. Catastrophic fusion is said to have occurred whenthe performance of a multimodal system is inferior to the performance ofsome isolated modules, e.g., when the performance of the audio visualspeech recognition system is inferior to that of the audio system alone.Catastrophic fusion arises because recognition modules make implicitassumptions and thus operate correctly only within a certain context.Practice shows that when modules are tested in contexts inconsistent withtheir assumptions, their influence on the fused product tends to increase,with catastrophic results. We propose a principled solution to this problembased upon Bayesian ideas of competitive models and inferencerobustification. We study the approach analytically on a classic Gaussiandiscrimination task and then apply it to a realistic problem on audiovisual speech recognition (AVSR) with excellent results.