The nature of statistical learning theory
The nature of statistical learning theory
Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Interpretation and Coding of Face Images Using Flexible Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Face Recognition by Elastic Bunch Graph Matching
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Pattern Analysis and Machine Intelligence
Support Vector Regression and Classification Based Multi-View Face Detection and Recognition
FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
Person identification using automatic integration of speech, lip, and face experts
WBMA '03 Proceedings of the 2003 ACM SIGMM workshop on Biometrics methods and applications
Noise adaptive stream weighting in audio-visual speech recognition
EURASIP Journal on Applied Signal Processing
Deformable templates for face recognition
Journal of Cognitive Neuroscience
Journal of Cognitive Neuroscience
Score normalization in multimodal biometric systems
Pattern Recognition
Audio-visual speaker identification based on the use of dynamic audio and visual features
AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
IEEE Transactions on Multimedia
Face recognition: a convolutional neural-network approach
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Discriminatory information about person identity is multimodal. Yet, most person recognition systems are unimodal, e.g. the use of facial appearance. With a view to exploiting the complementary nature of different modes of information and increasing pattern recognition robustness to test signal degradation, we developed a multiple expert biometric person identification system that combines information from three experts: face, visual speech, and audio. The system uses multimodal fusion in an automatic unsupervised manner, adapting to the local performance and output reliability of each of the experts. The expert weightings are chosen automatically such that the reliability measure of the combined scores is maximized. To test system robustness to train/test mismatch, we used a broad range of Gaussian noise and JPEG compression to degrade the audio and visual signals, respectively. Experiments were carried out on the XM2VTS database. The multimodal expert system out performed each of the single experts in all comparisons. At severe audio and visual mismatch levels tested, the audio, mouth, face, and tri-expert fusion accuracies were 37.1%, 48%, 75%, and 92.7% respectively, representing a relative improvement of 23.6% over the best performing expert.