Information fusion in biometrics
Pattern Recognition Letters - Special issue: Audio- and video-based biometric person authentication (AVBPA 2001)
Face recognition: A literature survey
ACM Computing Surveys (CSUR)
Robust Real-Time Face Detection
International Journal of Computer Vision
Journal of Cognitive Neuroscience
Why Is Facial Occlusion a Challenging Problem?
ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
Speaker Verification Based on Different Vector Quantization Techniques with Gaussian Mixture Models
NSS '09 Proceedings of the 2009 Third International Conference on Network and System Security
Audio-visual identity verification: an introductory overview
Progress in nonlinear speech processing
Hi-index | 0.00 |
Analyses of facial and audio features have been considered separately in conventional speaker identification systems. Herein, we propose a robust algorithm for text-independent speaker identification based on a decision-level and feature-level fusion of facial and audio features. The suggested approach makes use of Mel-frequency Cepstral Coefficients (MFCCs) for audio signal processing, Viola-Jones Haar cascade algorithm for face detection from video, eigenface features (EFF) and Gaussian Mixture Models (GMMs) for feature-level and decision-level fusion of audio and video. Decision-level fusion is carried out using PCA for face and GMM for audio through AND voting. Feature-level fusion is investigated by combining both MFCC (audio) and PCA (face) features to construct a hybrid GMM for each speaker. Testing on GRID, a multi-speaker audio-visual database, shows that the decision-level fusion of PCA (face) and GMM (audio) achieves 98.2 % accuracy and it is almost 15 % more efficient than feature-level fusion.