Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
A computational scheme for reasoning in dynamic probabilistic networks
UAI '92 Proceedings of the eighth conference on Uncertainty in Artificial Intelligence
Fundamentals of speech recognition
Fundamentals of speech recognition
Digital image processing
Bayesian Networks and Decision Graphs
Bayesian Networks and Decision Graphs
Learning Dynamic Bayesian Networks
Adaptive Processing of Sequences and Data Structures, International Summer School on Neural Networks, "E.R. Caianiello"-Tutorial Lectures
Coupled hidden Markov models for complex action recognition
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Fusion of Visual and Acoustic Signals for Command-Word Recognition
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Nonlinear manifold learning for visual speech recognition
ICCV '95 Proceedings of the Fifth International Conference on Computer Vision
Dynamic bayesian networks for information fusion with applications to human-computer interfaces
Dynamic bayesian networks for information fusion with applications to human-computer interfaces
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Integrating audio and visual information to provide highly robust speech recognition
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Asynchrony modeling for audio-visual speech recognition
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Audio-visual speech modeling for continuous speech recognition
IEEE Transactions on Multimedia
Timeline-based information assimilation in multimedia surveillance and monitoring systems
Proceedings of the third ACM international workshop on Video surveillance & sensor networks
A Speech-Centric Perspective for Human-Computer Interface: A Case Study
Journal of VLSI Signal Processing Systems
Privacy intrusion detection using dynamic Bayesian networks
ICEC '06 Proceedings of the 8th international conference on Electronic commerce: The new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internet
Data Fusion and Multicue Data Matching by Diffusion Maps
IEEE Transactions on Pattern Analysis and Machine Intelligence
A two-channel training algorithm for hidden Markov model and its application to lip reading
EURASIP Journal on Applied Signal Processing
Local spatiotemporal descriptors for visual recognition of spoken phrases
Proceedings of the international workshop on Human-centered multimedia
Natural interaction in intelligent spaces: Designing for architecture and entertainment
Multimedia Tools and Applications
Multi-stream Fusion for Speaker Classification
Speaker Classification I
DYNAMIC MODELING OF GROUNDWATER POLLUTANTS WITH BAYESIAN NETWORKS
Applied Artificial Intelligence
Dynamic inference control in privacy preference enforcement
Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services
Probabilistic Methods for Financial and Marketing Informatics
Probabilistic Methods for Financial and Marketing Informatics
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Robot Command Interface Using an Audio-Visual Speech Recognition System
CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Lipreading with local spatiotemporal descriptors
IEEE Transactions on Multimedia
A framework and token passing model for continuous speech recognition with dynamic Bayesian networks
SPPRA '08 Proceedings of the Fifth IASTED International Conference on Signal Processing, Pattern Recognition and Applications
Design and implementation of a Bayesian network speech recognizer
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Conjugate mixture models for clustering multimodal data
Neural Computation
International Journal of Speech Technology
Modeling timing structure in multimedia signals
AMDO'06 Proceedings of the 4th international conference on Articulated Motion and Deformable Objects
An information acquiring channel —— lip movement
ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
Dynamic bayesian networks for audio-visual speaker recognition
ICB'06 Proceedings of the 2006 international conference on Advances in Biometrics
Fusing data streams in continuous audio-visual speech recognition
TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Dynamic bayesian networks for language modeling
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Temporal context lie detection and generation
SDM'06 Proceedings of the Third VLDB international conference on Secure Data Management
Hi-index | 0.00 |
The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for features that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements in environments affected by acoustic noise. In this paper, we describe the use of two statistical models for audio-visual integration, the coupled HMM (CHMM) and the factorial HMM (FHMM), and compare the performance of these models with the existing models used in speaker dependent audio-visual isolated word recognition. The statistical properties of both the CHMM and FHMM allow to model the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. In our experiments, the CHMM performs best overall, outperforming all the existing models and the FHMM.