Dynamic modality weighting for multi-stream hmms inaudio-visual speech recognition

Authors:
Mihai Gurban;Jean-Philippe Thiran;Thomas Drugman;Thierry Dutoit
Affiliations:
École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland;École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland;Faculté Polytechnique de Mons, Mons, Belgium;Faculté Polytechnique de Mons, Mons, Belgium
Venue:
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Year:
2008

Citing 2
Cited 0

Moving-talker, speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus

EURASIP Journal on Applied Signal Processing
Audio-visual speech modeling for continuous speech recognition

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Merging decisions from different modalities is a crucial problem in Audio-Visual Speech Recognition. To solve this, state synchronous multi-stream HMMs have been proposed for their important advantage of incorporating stream reliability in their fusion scheme. This paper focuses on stream weight adaptation based on modality confidence estimators. We assume different and time-varying environment noise, as can be encountered in realistic applications, and, for this, adaptive methods are best suited. Stream reliability is assessed directly through classifier outputs since they are not specific to either noise type or level. The influence of constraining the weights to sum to one is also discussed.