A fused hidden Markov model with application to bimodal speech processing

Authors:
Hao Pan;S.E. Levinson;T.S. Huang;Zhi-Pei Liang
Affiliations:
Sharp Labs. of America Inc., Camas, WA, USA;-;-;-
Venue:
IEEE Transactions on Signal Processing
Year:
2004

Citing 0
Cited 10

Audio-visual speaker verification using continuous fused HMMs

VisHCI '06 Proceedings of the HCSNet workshop on Use of vision in human-computer interaction - Volume 56
Robust face-voice based speaker identity verification using multilevel fusion

Image and Vision Computing
Reliability score based multimodal fusion for biometric person authentication

MATH'08 Proceedings of the American Conference on Applied Mathematics
Dynamic visual features for audio-visual speaker verification

Computer Speech and Language
Realistic visual speech synthesis based on hybrid concatenation method

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Automatic temporal segment detection and affect recognition from face and body display

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Feature Fusion Applied to Missing Data ASR with the Combination of Recognizers

Journal of Signal Processing Systems
Multi-view gymnastic activity recognition with fused HMM

ACCV'07 Proceedings of the 8th Asian conference on Computer vision - Volume Part I
A Bayesian network modeling approach for cross media analysis

Image Communication
Detecting DDoS attacks based on multi-stream fused HMM in source-end network

CANS'06 Proceedings of the 5th international conference on Cryptology and Network Security

Quantified Score

Hi-index	35.68

Visualization

Abstract

This paper presents a novel fused hidden Markov model (fused HMM) for integrating tightly coupled time series, such as audio and visual features of speech. In this model, the time series are first modeled by two conventional HMMs separately. The resulting HMMs are then fused together using a probabilistic fusion model, which is optimal according to the maximum entropy principle and a maximum mutual information criterion. Simulations and bimodal speaker verification experiments show that the proposed model can significantly reduce the recognition errors in noiseless or noisy environments.