A Bayesian approach to audio-visual speaker identification
AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
Audio-visual speech modeling for continuous speech recognition
IEEE Transactions on Multimedia
A review of speech-based bimodal recognition
IEEE Transactions on Multimedia
Multimodal decision-level fusion for person authentication
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Audio visual person authentication by multiple nearest neighbor classifiers
ICB'07 Proceedings of the 2007 international conference on Advances in Biometrics
Bayesian multimodal fusion in forensic applications
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Individuality in communicative bodily behaviours
COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Hi-index | 0.00 |
This paper explores the fusion of audio and visual evidences through a multi-level hybrid fusion architecture based on dynamic Bayesian network (DBN), which combines model level and decision level fusion to achieve higher performance. In model level fusion, a new audio-visual correlative model (AVCM) based on DBN is proposed, which describes both the inter-correlations and loose timing synchronicity between the audio and video streams. The experiments on the CMU database and our own homegrown database both demonstrate that the methods can improve the accuracies of audio-visual bimodal speaker identification at all levels of acoustic signal-to-noise-ratios (SNR) from 0dB to 30dB with varying acoustic conditions.