Fundamentals of speech recognition
Fundamentals of speech recognition
Adaptive fusion of acoustic and visual sources for automatic speech recognition
Speech Communication - Special issue on auditory-visual speech processing
Models for Audiovisual Fusion in a Noisy-Vowel Recognition Task
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Recognition of Visual Speech Elements Using Hidden Markov Models
PCM '02 Proceedings of the Third IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Sensor fusion weighting measures in Audio-Visual Speech Recognition
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Practical Genetic Algorithms with CD-ROM
Practical Genetic Algorithms with CD-ROM
Voiceless speech recognition using dynamic visual speech features
VisHCI '06 Proceedings of the HCSNet workshop on Use of vision in human-computer interaction - Volume 56
Audio-visual speech recognition using lip information extracted from side-face images
EURASIP Journal on Audio, Speech, and Music Processing
Journal on Image and Video Processing - Anthropocentric Video Analysis: Tools and Applications
Audio-Visual Automatic Speech Recognition for Connected Digits
IITA '08 Proceedings of the 2008 Second International Symposium on Intelligent Information Technology Application - Volume 03
Visual speech recognition using motion features and hidden Markov models
CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Audio-visual speech modeling for continuous speech recognition
IEEE Transactions on Multimedia
A review of speech-based bimodal recognition
IEEE Transactions on Multimedia
Robust Audio-Visual Speech Recognition Based on Late Integration
IEEE Transactions on Multimedia
Hi-index | 0.00 |
Audio-visual speech recognition (AVSR) using acoustic and visual signals of speech has received attention recently because of its robustness in noisy environments. An important issue in decision fusion based AVSR system is the determination of appropriate integration weight for the speech modalities to integrate and ensure better performance under various SNR conditions. Generally, the integration weight is calculated from the relative reliability of two modalities. This paper investigates the effect of reliability measure on integration weight estimation and proposes a genetic algorithm (GA) based reliability measure which uses optimum number of best recognition hypotheses rather than N best recognition hypotheses to determine an appropriate integration weight. Further improvement in recognition accuracy is achieved by optimizing the above measured integration weight by genetic algorithm. The performance of the proposed integration weight estimation scheme is demonstrated for isolated word recognition (incorporating commonly used functions in mobile phones) via multi-speaker database experiment. The results show that the proposed schemes improve robust recognition accuracy over the conventional unimodal systems, and a couple of related existing bimodal systems, namely, the baseline reliability ratio-based system and N best recognition hypotheses reliability ratio-based system under various SNR conditions.