The M2VTS Multimodal Face Database (Release 1.00)
AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication
Articulatory features for robust visual speech recognition
Proceedings of the 6th international conference on Multimodal interfaces
Audio-visual speech modeling for continuous speech recognition
IEEE Transactions on Multimedia
Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations
Proceedings of the 8th international conference on Multimodal interfaces
Non-parametric and light-field deformable models
Computer Vision and Image Understanding
Voiceless speech recognition using dynamic visual speech features
VisHCI '06 Proceedings of the HCSNet workshop on Use of vision in human-computer interaction - Volume 56
Local spatiotemporal descriptors for visual recognition of spoken phrases
Proceedings of the international workshop on Human-centered multimedia
Visual recognition of speech consonants using facial movement features
Integrated Computer-Aided Engineering - Informatics in Control, Automation and Robotics
The CAVA corpus: synchronised stereoscopic and binaural datasets with head movements
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Multimodal Signals: Cognitive and Algorithmic Issues
Lipreading with local spatiotemporal descriptors
IEEE Transactions on Multimedia
Lips shape extraction via active shape model and local binary pattern
MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on gait analysis
Speaker identification and speech recognition using phased arrays
Ambient Intelligence in Everyday Life
Speech audio retrieval using voice query
ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
Proximity-Based order-respecting intersection for searching in image databases
AMR'10 Proceedings of the 8th international conference on Adaptive Multimedia Retrieval: context, exploration, and fusion
Dynamic units of visual speech
EUROSCA'12 Proceedings of the 11th ACM SIGGRAPH / Eurographics conference on Computer Animation
Dynamic units of visual speech
Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Hi-index | 0.00 |
This paper presents the development and evaluation of a speaker-independent audio-visual speech recognition (AVSR) system that utilizes a segment-based modeling strategy. To support this research, we have collected a new video corpus, called Audio-Visual TIMIT (AV-TIMIT), which consists of 4 total hours of read speech collected from 223 different speakers. This new corpus was used to evaluate our new AVSR system which incorporates a novel audio-visual integration scheme using segment-constrained Hidden Markov Models (HMMs). Preliminary experiments have demonstrated improvements in phonetic recognition performance when incorporating visual information into the speech recognition process.