Robust automatic human identification using face, mouth, and acoustic information

  • Authors:
  • Niall A. Fox;Ralph Gross;Jeffrey F. Cohn;Richard B. Reilly

  • Affiliations:
  • Dept. of Electronic and Electrical Engineering, University College Dublin, Belfield, Dublin 4, Ireland;Robotics Institute, Carnegie Mellon University, Pittsburgh, PA;Robotics Institute, Carnegie Mellon University, Pittsburgh, PA;Dept. of Electronic and Electrical Engineering, University College Dublin, Belfield, Dublin 4, Ireland

  • Venue:
  • AMFG'05 Proceedings of the Second international conference on Analysis and Modelling of Faces and Gestures
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Discriminatory information about person identity is multimodal. Yet, most person recognition systems are unimodal, e.g. the use of facial appearance. With a view to exploiting the complementary nature of different modes of information and increasing pattern recognition robustness to test signal degradation, we developed a multiple expert biometric person identification system that combines information from three experts: face, visual speech, and audio. The system uses multimodal fusion in an automatic unsupervised manner, adapting to the local performance and output reliability of each of the experts. The expert weightings are chosen automatically such that the reliability measure of the combined scores is maximized. To test system robustness to train/test mismatch, we used a broad range of Gaussian noise and JPEG compression to degrade the audio and visual signals, respectively. Experiments were carried out on the XM2VTS database. The multimodal expert system out performed each of the single experts in all comparisons. At severe audio and visual mismatch levels tested, the audio, mouth, face, and tri-expert fusion accuracies were 37.1%, 48%, 75%, and 92.7% respectively, representing a relative improvement of 23.6% over the best performing expert.