Person identification using automatic integration of speech, lip, and face experts

Authors:
Niall A. Fox;Ralph Gross;Philip de Chazal;Jeffery F. Cohn;Richard B. Reilly
Affiliations:
University College Dublin, Dublin, Ireland;Carnegie Mellon University,Pittsburgh, PA;University College Dublin, Dublin, Ireland;Carnegie Mellon University, Pittsburgh, PA;University College Dublin, Dublin, Ireland
Venue:
WBMA '03 Proceedings of the 2003 ACM SIGMM workshop on Biometrics methods and applications
Year:
2003

Citing 10
Cited 11

The nature of statistical learning theory

The nature of statistical learning theory
Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Interpretation and Coding of Face Images Using Flexible Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Face Recognition by Elastic Bunch Graph Matching

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Person Identification Using Multiple Cues

IEEE Transactions on Pattern Analysis and Machine Intelligence
Support Vector Regression and Classification Based Multi-View Face Detection and Recognition

FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
Noise adaptive stream weighting in audio-visual speech recognition

EURASIP Journal on Applied Signal Processing
Audio-visual speaker identification based on the use of dynamic audio and visual features

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
Face recognition: a convolutional neural-network approach

IEEE Transactions on Neural Networks

Local spatiotemporal descriptors for visual recognition of spoken phrases

Proceedings of the international workshop on Human-centered multimedia
Lipreading with local spatiotemporal descriptors

IEEE Transactions on Multimedia
Audio, video and multimodal person identification in a smart room

CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
Visual processing-inspired fern-audio features for noise-robust speaker verification

Proceedings of the 2010 ACM Symposium on Applied Computing
Combining dynamic texture and structural features for speaker identification

Proceedings of the 2nd ACM workshop on Multimedia in forensics, security and intelligence
Multimodal coordination of facial action, head rotation, and eye motion during spontaneous smiles

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
Robust automatic human identification using face, mouth, and acoustic information

AMFG'05 Proceedings of the Second international conference on Analysis and Modelling of Faces and Gestures
VALID: a new practical audio-visual database, and comparative results

AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
Audio-Visual speaker identification via adaptive fusion using reliability estimates of both modalities

AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
Histogram equalization in SVM multimodal person verification

ICB'07 Proceedings of the 2007 international conference on Advances in Biometrics
A survey on multi person identification and localization

Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a multi-expert person identification system based on the integration of three separate systems employing audio features, static face images and lip motion features respectively. Audio person identification was carried out using a text dependent Hidden Markov Model methodology. Modeling of the lip motion was carried out using Gaussian probability density functions. The static image based identification was carried out using the FaceIt system. Experiments were conducted with 251 subjects from the XM2VTS audio-visual database. Late integration using automatic weights was employed to combine the three experts. The integration strategy adapts automatically to the audio noise conditions. It was found that the integration of the three experts improved the person identification accuracies for both clean and noisy audio conditions compared with the audio only case. For audio, FaceIt, lip motion, and tri-expert identification, maximum accuracies achieved were 98%, 93.22%, 86.37% and 100% respectively. Maximum bi-expert integration of the two visual experts achieved an identification accuracy of 96.8% which is comparable to the best audio accuracy of 98%.