Information fusion and decision cascading for audio-visual speaker recognition based on time-varying stream reliability prediction

Authors:
U. V. Chaudhari;G. N. Ramaswamy;G. Potamianos;C. Neti
Affiliations:
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA;IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA;IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA;IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Venue:
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Year:
2003

Citing 0
Cited 5

Multimodal Person Recognition for Human-Vehicle Interaction

IEEE MultiMedia
Multimodal speaker/speech recognition using lip motion, lip texture and audio

Signal Processing - Special section: Multimodal human-computer interfaces
Biometric person authentication with liveness detection based on audio-visual fusion

International Journal of Biometrics
The persian linguistic based audio-visual data corpus, AVA II, considering coarticulation

MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Audiovisual diarization of people in video content

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We examine the techniques for multi-modal biometric information fusion for verification and identification of speakers, where the reliability of each data stream, either audio of video, is modeled with parameters that are time-varying and depend on the context created by its local behavior. The complementary nature and the time dependent relative reliability of audio and video data is studied in the context of verification and identification, on data collected during a user's interaction with an automated system. Of significance is that this data is not corrupted artificially. Particular focus is directed to verification and its ability to refine identification decisions, by indicating a level of confidence in the system decisions. Results show more striking effects for verification, when using time-dependent fusion, than for identification.