Image-based multimodal face authentication
Signal Processing
Person Identification Using Multiple Cues
IEEE Transactions on Pattern Analysis and Machine Intelligence
A survey of approaches and challenges in 3D and multi-modal 3D + 2D face recognition
Computer Vision and Image Understanding
Liveness Detection for Fingerprint Scanners Based on the Statistics of Wavelet Signal Processing
CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Perceiving Biological Motion: Dissociating Visible Speech from Walking
Journal of Cognitive Neuroscience
Adaptive bimodal sensor fusion for automatic speechreading
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Noise adaptive stream weighting in audio-visual speech recognition
EURASIP Journal on Applied Signal Processing
Discrete-time speech signal processing: principles and practice
Discrete-time speech signal processing: principles and practice
The use of Speech and Lip Modalities for Robust Speaker Verification under Adverse Conditions
ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2
A fused hidden Markov model with application to bimodal speech processing
IEEE Transactions on Signal Processing
Multimodal decision-level fusion for person authentication
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Fusion of face and speech data for person identity verification
IEEE Transactions on Neural Networks
Lips tracking biometrics for speaker recognition
International Journal of Biometrics
Blind image tamper detection based on multimodal fusion
ICONIP'10 Proceedings of the 17th international conference on Neural information processing: models and applications - Volume Part II
A robust color image watermarking with Singular Value Decomposition method
Advances in Engineering Software
Hi-index | 0.00 |
In this paper, we propose a robust multilevel fusion strategy involving cascaded multimodal fusion of audio-lip-face motion, correlation and depth features for biometric person authentication. The proposed approach combines the information from different audio-video based modules, namely: audio-lip motion module, audio-lip correlation module, 2D+3D motion-depth fusion module, and performs a hybrid cascaded fusion in an automatic, unsupervised and adaptive manner, by adapting to the local performance of each module. This is done by taking the output-score based reliability estimates (confidence measures) of each of the module into account. The module weightings are determined automatically such that the reliability measure of the combined scores is maximised. To test the robustness of the proposed approach, the audio and visual speech (mouth) modalities are degraded to emulate various levels of train/test mismatch; employing additive white Gaussian noise for the audio and JPEG compression for the video signals. The results show improved fusion performance for a range of tested levels of audio and video degradation, compared to the individual module performances. Experiments on a 3D stereovision database AVOZES show that, at severe levels of audio and video mismatch, the audio, mouth, 3D face, and tri-module (audio-lip motion, correlation and depth) fusion EERs were 42.9%, 32%, 15%, and 7.3%, respectively, for biometric person authentication task.