Fundamentals of speech recognition
Fundamentals of speech recognition
Speaker identification and verification using Gaussian mixture speaker models
Speech Communication
IEEE Transactions on Pattern Analysis and Machine Intelligence
Extraction of Visual Features for Lipreading
IEEE Transactions on Pattern Analysis and Machine Intelligence
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Person Identification Using Multiple Cues
IEEE Transactions on Pattern Analysis and Machine Intelligence
Initialized Eigenlip Estimator for Fast Lip Tracking Using Linear Regression
ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 3
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Audio-visual speech recognition using MPEG-4 compliant visual features
EURASIP Journal on Applied Signal Processing
Automatic speechreading with applications to human-computer interfaces
EURASIP Journal on Applied Signal Processing
Robust lip contour extraction using separability of multi-dimensional distributions
FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
Audio-visual speech modeling for continuous speech recognition
IEEE Transactions on Multimedia
A review of speech-based bimodal recognition
IEEE Transactions on Multimedia
Multimodal speaker identification using an adaptive classifier cascade based on modality reliability
IEEE Transactions on Multimedia
Accurate and quasi-automatic lip tracking
IEEE Transactions on Circuits and Systems for Video Technology
Human Lips as Emerging Biometrics Modality
ICIAR '08 Proceedings of the 5th international conference on Image Analysis and Recognition
Japanese 45 Single Sounds Recognition Using Intraoral Shape
IEICE - Transactions on Information and Systems
Modeling Aspects of Multimodal Lithuanian Human - Machine Interface
Multimodal Signals: Cognitive and Algorithmic Issues
Combining different biometric traits with one-class classification
Signal Processing
Lips Recognition for Biometrics
ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
Dynamic visual features for audio-visual speaker verification
Computer Speech and Language
Multimodal speaker verification based on electroglottograph signal and glottal activity detection
EURASIP Journal on Advances in Signal Processing
Feature Fusion Using Multiple Component Analysis
Neural Processing Letters
Integration of face detection and user identification with visual speech recognition
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V
Hi-index | 0.01 |
We present a new multimodal speaker/speech recognition system that integrates audio, lip texture and lip motion modalities. Fusion of audio and face texture modalities has been investigated in the literature before. The emphasis of this work is to investigate the benefits of inclusion of lip motion modality for two distinct cases: speaker and speech recognition. The audio modality is represented by the well-known mel-frequency cepstral coefficients (MFCC) along with the first and second derivatives, whereas lip texture modality is represented by the 2D-DCT coefficients of the luminance component within a bounding box about the lip region. In this paper, we employ a new lip motion modality representation based on discriminative analysis of the dense motion vectors within the same bounding box for speaker/speech recognition. The fusion of audio, lip texture and lip motion modalities is performed by the so-called reliability weighted summation (RWS) decision rule. Experimental results show that inclusion of lip motion modality provides further performance gains over those which are obtained by fusion of audio and lip texture alone, in both speaker identification and isolated word recognition scenarios.