Audio-video biometric recognition for non-collaborative access granting

Authors:
Christian Micheloni;Sergio Canazza;Gian Luca Foresti
Affiliations:
Department of Computer Science, University of Udine, Via delle Scienze 206, 33100 Udine, Italy;Department of Historical and Documentary Sciences, University of Udine, Via Petracco 8, 33100 Udine, Italy;Department of Computer Science, University of Udine, Via delle Scienze 206, 33100 Udine, Italy
Venue:
Journal of Visual Languages and Computing
Year:
2009

Citing 19
Cited 0

Steady-state and parameter tracking properties of self-tuning minimum variance regulators

Automatica (Journal of IFAC)
Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
Neural Network-Based Face Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
The FERET Evaluation Methodology for Face-Recognition Algorithms

IEEE Transactions on Pattern Analysis and Machine Intelligence
Detecting Faces in Images: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
Identification of Time-Varying Processes

Identification of Time-Varying Processes
Linear Prediction of Speech

Linear Prediction of Speech
Training Support Vector Machines: an Application to Face Detection

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Multi-View Face Detection with FloatBoost

WACV '02 Proceedings of the Sixth IEEE Workshop on Applications of Computer Vision
Probabilistic recognition of human faces from video

Computer Vision and Image Understanding - Special issue on Face recognition
Face recognition: A literature survey

ACM Computing Surveys (CSUR)
Robust Real-Time Face Detection

International Journal of Computer Vision
Image Acquisition Enhancement for Active Video Surveillance

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition

Pattern Recognition Letters
2D and 3D face recognition: A survey

Pattern Recognition Letters
From still image to video-based face recognition: an experimental analysis

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
Adaptive scheme for elimination of broadband noise and impulsivedisturbances from AR and ARMA signals

IEEE Transactions on Signal Processing
On causal algorithms for speech enhancement

IEEE Transactions on Audio, Speech, and Language Processing
Face recognition using recursive Fisher linear discriminant

IEEE Transactions on Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, the problem of non-collaborative person identification for a secure access to facilities is addressed. The proposed solution adopts a face and a speaker recognition techniques. The integration of these two methods allows to improve the performance with respect to the two classifiers. In non-collaborative scenarios, the problem of face recognition first requires to detect the face pattern then to recognize it even when in non-frontal poses. In the current work, a histogram normalization, a boosting technique and a linear discrimination analysis have been exploited to solve typical problems like illumination variability, occlusions, pose variation, etc. In addition, a new temporal classification is proposed to improve the robustness of the frame-by-frame classification. This allows to project known classification techniques for still image recognition into a multi-frame context where the image capture allows dynamics in the environment. For the audio, a method for the automatic speaker identification in noisy environments is presented. In particular, we propose an optimization of a speech de-noising algorithm to optimize the performance of the extended Kalman filter (EKF). To provide a baseline system for the integration with our proposed speech de-noising algorithm, we use a conventional speaker recognition system, based on Gaussian mixture models and mel frequency cepstral coefficients (MFCCs) as features. To confirm the effectiveness of our methods, we performed video and speaker recognition tasks first separately then integrating the results. In particular, two different corpora have been used: (a) a public corpus (ELDSR for audio and FERRET for images) and (b) a dedicated audio/video corpus, in which the speakers read a list of sentences wearing a scarf or a full-face motorcycle helmet. Experimental results show that our methods are able to reduce significantly the classification error rate.