Multimodal person authentication using speech, face and visual speech

Authors:
S. Palanivel;B. Yegnanarayana
Affiliations:
Speech and Vision Laboratory, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India;Speech and Vision Laboratory, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India
Venue:
Computer Vision and Image Understanding
Year:
2008

Citing 20
Cited 1

Scale-Space Properties of the Multiscale Morphological Dilation-Erosion

IEEE Transactions on Pattern Analysis and Machine Intelligence
Using Discriminant Eigenfeatures for Image Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Face Recognition by Elastic Bunch Graph Matching

IEEE Transactions on Pattern Analysis and Machine Intelligence
Using Support Vector Machines to Enhance the Performance of Elastic Graph Matching for Frontal Face Authentication

IEEE Transactions on Pattern Analysis and Machine Intelligence
Detecting Faces in Images: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
Face Detection in Color Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Face Recognition Using Line Edge Map

IEEE Transactions on Pattern Analysis and Machine Intelligence
AANN: an alternative to GMM for pattern recognition

Neural Networks
Distortion Invariant Object Recognition in the Dynamic Link Architecture

IEEE Transactions on Computers
Towards unconstrained face recognition from image sequences

FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)
PersonSpotter - Fast and Robust System for Human Detection, Tracking and Recognition

FG '98 Proceedings of the 3rd. International Conference on Face & Gesture Recognition
Probabilistic recognition of human faces from video

Computer Vision and Image Understanding - Special issue on Face recognition
Eigenfaces for recognition

Journal of Cognitive Neuroscience
Locating and extracting the eye in human face images

Pattern Recognition
Video-based face recognition using adaptive hidden markov models

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
Frontal face authentication using discriminating grids withmorphological feature vectors

IEEE Transactions on Multimedia
A review of speech-based bimodal recognition

IEEE Transactions on Multimedia
Lip image segmentation using fuzzy clustering incorporating an elliptic shape function

IEEE Transactions on Image Processing

Recognition of emotions from video using neural network models

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a method for automatic multimodal person authentication using speech, face and visual speech modalities. The proposed method uses the motion information to localize the face region, and the face region is processed in YC"rC"b color space to determine the locations of the eyes. The system models the nonlip region of the face using a Gaussian distribution, and it is used to estimate the center of the mouth. Facial and visual speech features are extracted using multiscale morphological erosion and dilation operations, respectively. The facial features are extracted relative to the locations of the eyes, and visual speech features are extracted relative to the locations of the eyes and mouth. Acoustic features are derived from the speech signal, and are represented by weighted linear prediction cepstral coefficients (WLPCC). Autoassociative neural network (AANN) models are used to capture the distribution of the extracted acoustic, facial and visual speech features. The evidence from speech, face and visual speech models are combined using a weighting rule, and the result is used to accept or reject the identity claim of the subject. The performance of the system is evaluated for newsreaders in TV broadcast news data, and the system achieves an equal error rate (EER) of about 0.45% for 50 subjects.