Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
The M2VTS Multimodal Face Database (Release 1.00)
AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication
Automatic Analysis of Multimodal Group Actions in Meetings
IEEE Transactions on Pattern Analysis and Machine Intelligence
Integrating audio and visual information to provide highly robust speech recognition
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Robust speech recognition in a car using a microphone array
Robust speech recognition in a car using a microphone array
The AMI meeting corpus: a pre-announcement
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
VACE multimodal meeting corpus
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
A novel speech content authentication algorithm based on Bessel-Fourier moments
Digital Signal Processing
Hi-index | 0.00 |
Audio-visual speech recognition, or the combination of visual lip-reading with traditional acoustic speech recognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visual speech recognition literature to show that further improvements in speech recognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visual speech recognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotive audio-visual speech database. We study the relative contribution between the side and central orientated cameras in improving visual speech recognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.