Dynamic Bayesian networks for audio-visual speech recognition

Authors:
Ara V. Nefian;Luhong Liang;Xiaobo Pi;Xiaoxing Liu;Kevin Murphy
Affiliations:
Intel Corporation, Microprocessor Research Labs, Santa Clara, CA;Intel Corporation, Microprocessor Research Labs, Chaoyang District, Beijing, China;Intel Corporation, Microprocessor Research Labs, Chaoyang District, Beijing, China;Intel Corporation, Microprocessor Research Labs, Chaoyang District, Beijing, China;Computer Science Division, University of California, Berkeley, Berkeley, CA
Venue:
EURASIP Journal on Applied Signal Processing
Year:
2002

Citing 14
Cited 24

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
A computational scheme for reasoning in dynamic probabilistic networks

UAI '92 Proceedings of the eighth conference on Uncertainty in Artificial Intelligence
Fundamentals of speech recognition

Fundamentals of speech recognition
Digital image processing

Digital image processing
Bayesian Networks and Decision Graphs

Bayesian Networks and Decision Graphs
Learning Dynamic Bayesian Networks

Adaptive Processing of Sequences and Data Structures, International Summer School on Neural Networks, "E.R. Caianiello"-Tutorial Lectures
Coupled hidden Markov models for complex action recognition

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Fusion of Visual and Acoustic Signals for Command-Word Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Nonlinear manifold learning for visual speech recognition

ICCV '95 Proceedings of the Fifth International Conference on Computer Vision
Dynamic bayesian networks for information fusion with applications to human-computer interfaces

Dynamic bayesian networks for information fusion with applications to human-computer interfaces
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Integrating audio and visual information to provide highly robust speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Asynchrony modeling for audio-visual speech recognition

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Audio-visual speech modeling for continuous speech recognition

IEEE Transactions on Multimedia

Timeline-based information assimilation in multimedia surveillance and monitoring systems

Proceedings of the third ACM international workshop on Video surveillance & sensor networks
A Speech-Centric Perspective for Human-Computer Interface: A Case Study

Journal of VLSI Signal Processing Systems
Privacy intrusion detection using dynamic Bayesian networks

ICEC '06 Proceedings of the 8th international conference on Electronic commerce: The new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internet
Data Fusion and Multicue Data Matching by Diffusion Maps

IEEE Transactions on Pattern Analysis and Machine Intelligence
A two-channel training algorithm for hidden Markov model and its application to lip reading

EURASIP Journal on Applied Signal Processing
Local spatiotemporal descriptors for visual recognition of spoken phrases

Proceedings of the international workshop on Human-centered multimedia
Natural interaction in intelligent spaces: Designing for architecture and entertainment

Multimedia Tools and Applications
Multi-stream Fusion for Speaker Classification

Speaker Classification I
DYNAMIC MODELING OF GROUNDWATER POLLUTANTS WITH BAYESIAN NETWORKS

Applied Artificial Intelligence
Dynamic inference control in privacy preference enforcement

Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services
Probabilistic Methods for Financial and Marketing Informatics

Probabilistic Methods for Financial and Marketing Informatics
Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Robot Command Interface Using an Audio-Visual Speech Recognition System

CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Lipreading with local spatiotemporal descriptors

IEEE Transactions on Multimedia
A framework and token passing model for continuous speech recognition with dynamic Bayesian networks

SPPRA '08 Proceedings of the Fifth IASTED International Conference on Signal Processing, Pattern Recognition and Applications
Design and implementation of a Bayesian network speech recognizer

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Conjugate mixture models for clustering multimodal data

Neural Computation
Acoustic modeling problem for automatic speech recognition system: advances and refinements (Part II)

International Journal of Speech Technology
Modeling timing structure in multimedia signals

AMDO'06 Proceedings of the 4th international conference on Articulated Motion and Deformable Objects
An information acquiring channel —— lip movement

ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
Dynamic bayesian networks for audio-visual speaker recognition

ICB'06 Proceedings of the 2006 international conference on Advances in Biometrics
Fusing data streams in continuous audio-visual speech recognition

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Dynamic bayesian networks for language modeling

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Temporal context lie detection and generation

SDM'06 Proceedings of the Third VLDB international conference on Secure Data Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for features that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements in environments affected by acoustic noise. In this paper, we describe the use of two statistical models for audio-visual integration, the coupled HMM (CHMM) and the factorial HMM (FHMM), and compare the performance of these models with the existing models used in speaker dependent audio-visual isolated word recognition. The statistical properties of both the CHMM and FHMM allow to model the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. In our experiments, the CHMM performs best overall, outperforming all the existing models and the FHMM.