Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition

Authors:
Maycel-Isaac Faraj;Josef Bigun
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
2007

Citing 22
Cited 5

An improved automatic lipreading system to enhance speech recognition

CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Performance-driven facial animation

SIGGRAPH '90 Proceedings of the 17th annual conference on Computer graphics and interactive techniques
Multidimensional Orientation Estimation with Applications to Texture Analysis and Optical Flow

IEEE Transactions on Pattern Analysis and Machine Intelligence
The nature of statistical learning theory

The nature of statistical learning theory
Speechreading using probabilistic models

Computer Vision and Image Understanding - Special issue on physics-based modeling and reasoning in computer vision
Lip movement synthesis from speech based on hidden Markov models

Speech Communication - Special issue on auditory-visual speech processing
Optical Flow Constraints on Deformable Models with Applications to Face Tracking

International Journal of Computer Vision
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Person Identification Using Multiple Cues

IEEE Transactions on Pattern Analysis and Machine Intelligence
Acoustic-labial Speaker Verification

AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication
Expert Conciliation for Multi Modal Person Authentication Systems by Bayesian Statistics

AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication
Fusion of Audio-Visual Information for Integrated Speech Processing

AVBPA '01 Proceedings of the Third International Conference on Audio- and Video-Based Biometric Person Authentication
Face Authentication with Sparse Grid Gabor Information

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 4 - Volume 4
Evaluating Liveness by Face Images and the Structure Tensor

AUTOID '05 Proceedings of the Fourth IEEE Workshop on Automatic Identification Advanced Technologies
Person Verification by Lip-Motion

CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Speaker identification via support vector classifiers

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
On the use of support vector machines for phonetic classification

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
The use of Speech and Lip Modalities for Robust Speaker Verification under Adverse Conditions

ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2
Video based face recognition using multiple classifiers

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
Visual model structures and synchrony constraints for audio-visual speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
A review of speech-based bimodal recognition

IEEE Transactions on Multimedia
Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition

IEEE Transactions on Multimedia

Lip-Reading Technique Using Spatio-Temporal Templates and Support Vector Machines

CIARP '08 Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and Applications
Visual lip activity detection and speaker detection using mouth region intensities

IEEE Transactions on Circuits and Systems for Video Technology
Automatic visual feature extraction for mandarin audio-visual speech recognition

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Lipreading procedure for liveness verification in video authentication systems

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
Lipreading procedure based on dynamic programming

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I

Quantified Score

Hi-index	14.98

Visualization

Abstract

This paper presents the scheme and evaluation of a robust audio-visual digit-and-speaker-recognition system using lip motion and speech biometrics. Moreover, a liveness verification barrier based on a person's lip movement is added to the system to guard against advanced spoofing attempts such as replayed videos. The acoustic and visual features are integrated at the feature level and evaluated first by a Support Vector Machine for digit and speaker identification and, then, by a Gaussian Mixture Model for speaker verification. Based on ≈ 300 different personal identities, this paper represents, to our knowledge, the first extensive study investigating the added value of lip motion features for speaker and speech-recognition applications. Digit recognition and person-identification and verification experiments are conducted on the publicly available XM2VTS database showing favorable results (speaker verification is 98 percent, speaker identification is 100 percent, and digit identification is 83 percent to 100 percent).