Robust automatic human identification using face, mouth, and acoustic information

Authors:
Niall A. Fox;Ralph Gross;Jeffrey F. Cohn;Richard B. Reilly
Affiliations:
Dept. of Electronic and Electrical Engineering, University College Dublin, Belfield, Dublin 4, Ireland;Robotics Institute, Carnegie Mellon University, Pittsburgh, PA;Robotics Institute, Carnegie Mellon University, Pittsburgh, PA;Dept. of Electronic and Electrical Engineering, University College Dublin, Belfield, Dublin 4, Ireland
Venue:
AMFG'05 Proceedings of the Second international conference on Analysis and Modelling of Faces and Gestures
Year:
2005

Citing 16
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Interpretation and Coding of Face Images Using Flexible Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Face Recognition by Elastic Bunch Graph Matching

IEEE Transactions on Pattern Analysis and Machine Intelligence
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
BioID: A Multimodal Biometric Identification System

Computer
Support Vector Regression and Classification Based Multi-View Face Detection and Recognition

FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
Person identification using automatic integration of speech, lip, and face experts

WBMA '03 Proceedings of the 2003 ACM SIGMM workshop on Biometrics methods and applications
Noise adaptive stream weighting in audio-visual speech recognition

EURASIP Journal on Applied Signal Processing
Deformable templates for face recognition

Journal of Cognitive Neuroscience
Eigenfaces for recognition

Journal of Cognitive Neuroscience
Score normalization in multimodal biometric systems

Pattern Recognition
Audio-visual speaker identification based on the use of dynamic audio and visual features

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
Audio-Visual speaker identification via adaptive fusion using reliability estimates of both modalities

AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition

IEEE Transactions on Multimedia
Face recognition: a convolutional neural-network approach

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discriminatory information about person identity is multimodal. Yet, most person recognition systems are unimodal, e.g. the use of facial appearance. With a view to exploiting the complementary nature of different modes of information and increasing pattern recognition robustness to test signal degradation, we developed a multiple expert biometric person identification system that combines information from three experts: face, visual speech, and audio. The system uses multimodal fusion in an automatic unsupervised manner, adapting to the local performance and output reliability of each of the experts. The expert weightings are chosen automatically such that the reliability measure of the combined scores is maximized. To test system robustness to train/test mismatch, we used a broad range of Gaussian noise and JPEG compression to degrade the audio and visual signals, respectively. Experiments were carried out on the XM2VTS database. The multimodal expert system out performed each of the single experts in all comparisons. At severe audio and visual mismatch levels tested, the audio, mouth, face, and tri-expert fusion accuracies were 37.1%, 48%, 75%, and 92.7% respectively, representing a relative improvement of 23.6% over the best performing expert.