Minimum probability of error for asynchronous Gaussian multiple-access channels
IEEE Transactions on Information Theory
Lessons in digital estimation theory
Lessons in digital estimation theory
Unsupervised Optimal Fuzzy Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fuzzy and Neuro-Fuzzy Systems in Medicine
Fuzzy and Neuro-Fuzzy Systems in Medicine
Adaptive mouth segmentation using chromatic features
Pattern Recognition Letters
Locating and Tracking Facial Speech Features
ICPR '96 Proceedings of the 1996 International Conference on Pattern Recognition (ICPR '96) Volume I - Volume 7270
Biometric identification systems
Signal Processing
Integrating audio and visual information to provide highly robust speech recognition
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Audio-visual speech modeling for continuous speech recognition
IEEE Transactions on Multimedia
Selecting fuzzy if-then rules for classification problems using genetic algorithms
IEEE Transactions on Fuzzy Systems
Optical Memory and Neural Networks
Hi-index | 0.00 |
Robust multimodal identification systems based on audio-visual information has not been thoroughly investigated yet. The aim of this work is to propose a model-based feature extraction method which employs physiological characteristics of facial muscles producing lip movements. This approach adopts the intrinsic properties of muscles such as viscosity, elasticity, and mass which are extracted from the dynamic lip model. These parameters are exclusively dependent on the neuro-muscular properties of speaker; consequently imitation of valid speakers could be reduced to a large extent. These parameters are applied to a Hidden Markov Model (HMM) audio-visual identification system. In this work a combination of audio and video features has been employed by adopting a multistream pseudo-synchronized HMM training method. The proposed model is compared to other feature extraction methods including Kalman filtering, neural networks, adaptive network fuzzy inference system (ANFIS) and auto recursive moving average. The superior performance of the proposed system is demonstrated on a large multispeaker database of continuously spoken digits, along with a sentence that is phonetically rich. The combined Kalman filtering and proposed model led to the best performance. The phonetic content of pronounced sentences is also evaluated to achieve the optimized phonetic combinations which lead to the best identification rate.