Audio-visual speaker identification based on the use of dynamic audio and visual features

Authors:
Niall Fox;Richard B. Reilly
Affiliations:
Dept. of Electronic and Electrical Engineering, University College Dublin, Belfield, Dublin, Ireland;Dept. of Electronic and Electrical Engineering, University College Dublin, Belfield, Dublin, Ireland
Venue:
AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
Year:
2003

Citing 4
Cited 8

Extraction of Visual Features for Lipreading

IEEE Transactions on Pattern Analysis and Machine Intelligence
Face Recognition: Features Versus Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Person Identification Using Multiple Cues

IEEE Transactions on Pattern Analysis and Machine Intelligence
A review of speech-based bimodal recognition

IEEE Transactions on Multimedia

Person identification using automatic integration of speech, lip, and face experts

WBMA '03 Proceedings of the 2003 ACM SIGMM workshop on Biometrics methods and applications
Audiovisual speech synchrony measure: application to biometrics

EURASIP Journal on Applied Signal Processing
A method towards biometric feature fusion

International Journal of Biometrics
Dynamic visual features for audio-visual speaker verification

Computer Speech and Language
Robust automatic human identification using face, mouth, and acoustic information

AMFG'05 Proceedings of the Second international conference on Analysis and Modelling of Faces and Gestures
VALID: a new practical audio-visual database, and comparative results

AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
Audio-Visual speaker identification via adaptive fusion using reliability estimates of both modalities

AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
Biometric fusion by simulated annealing

International Journal of Knowledge-based and Intelligent Engineering Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a speaker identification system based on dynamical features of both the audio and visual modes. Speakers are modeled using a text dependent HMM methodology. Early and late audio-visual integration are investigated. Experiments are carried out for 252 speakers from the XM2VTS database. From our experimental results, it has been shown that the addition of the dynamical visual information improves the speaker identification accuracies for both clean and noisy audio conditions compared to the audio only case. The best audio, visual and audio-visual identification accuracies achieved were 86.91%, 57.14% and 94.05% respectively.