User Verification by Combining Speech and Face Biometrics in Video

Authors:
Imran Naseem;Ajmal Mian
Affiliations:
School of Electrical, Electronic and Computer Engineering,;School of Computer Science and Software Engineering, The University of Western Australia,
Venue:
ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing, Part II
Year:
2008

Citing 10
Cited 0

Second-order statistical measures for text-independent speaker identification

Speech Communication
Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Face recognition: A literature survey

ACM Computing Surveys (CSUR)
Robust Real-Time Face Detection

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Online Learning of Probabilistic Appearance Manifolds for Video-Based Recognition and Tracking

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Eigenfaces for recognition

Journal of Cognitive Neuroscience
Visual tracking and recognition using probabilistic appearance manifolds

Computer Vision and Image Understanding
Person spotting: video shot retrieval for face sets

CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, physiological biometrics from face are combined with behavioral biometrics from speech in video to achieve robust user authentication. The choice of biometrics is motivated by user convenience and robustness to forgery as it is hard to simultaneously forge these two biometrics. We used the Mel Frequency Cepstral Coefficients for text-independent speaker recognition and local scale invariant features for video-based face recognition. Results of the two classifiers were fused using a weighted sum rule and an equal error rate of 0.6% was achieved on the VidTIMIT audio-visual database. We also performed identification experiments and achieved a combined identification rate of 99.13% on the same database.