Audio–Visual Affective Expression Recognition Through Multistream Fused HMM

Authors:
Zhihong Zeng;Jilin Tu;B. M. Pianfetti;T. S. Huang
Affiliations:
Beckman Inst., Univ. of Illinois at Urbana-Champaign (UIUC), Urbana, IL;-;-;-
Venue:
IEEE Transactions on Multimedia
Year:
2008

Citing 0
Cited 21

Short-term emotion assessment in a recall paradigm

International Journal of Human-Computer Studies
Static vs. dynamic modeling of human nonverbal behavior from multiple cues and modalities

Proceedings of the 2009 international conference on Multimodal interfaces
Extraction and analysis of the speech emotion features based on multi-fractal spectrum

International Journal of Computer Applications in Technology
Robust workflow recognition using holistic features and outlier-tolerant fused hidden Markov models

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part I
Multimodal biometric human recognition for perceptual human-computer interaction

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A Bayesian network modeling approach for cross media analysis

Image Communication
Kernel fusion of audio and visual information for emotion recognition

ICIAR'11 Proceedings of the 8th international conference on Image analysis and recognition - Volume Part II
Audio visual emotion recognition based on triple-stream dynamic bayesian network models

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Bayesian filter based behavior recognition in workflows allowing for user feedback

Computer Vision and Image Understanding
Behavior recognition from multiple views using fused hidden markov models

SETN'10 Proceedings of the 6th Hellenic conference on Artificial Intelligence: theories, models and applications
2011 Special Issue: Online classification of visual tasks for industrial workflow monitoring

Neural Networks
Duration modeling for emotional speech

ICICA'12 Proceedings of the Third international conference on Information Computing and Applications
Multi-view facial expression recognition analysis with generic sparse coding feature

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Emotion-aware assistive system for humanistic care based on the orange computing concept

Applied Computational Intelligence and Soft Computing - Special issue on Awareness Science and Engineering
LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework

Image and Vision Computing
The MAHNOB Laughter database

Image and Vision Computing
Fusion of fragmentary classifier decisions for affective state recognition

MPRSS'12 Proceedings of the First international conference on Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction
Content-Based Multimedia Retrieval Using Feature Correlation Clustering and Fusion

International Journal of Multimedia Data Engineering & Management
Compensating for speaker or lexical variabilities in speech for emotion recognition

Speech Communication
Emoções na interação humano-computador: um estudo considerando sensores

Proceedings of the 12th Brazilian Symposium on Human Factors in Computing Systems
A top-down event-driven approach for concurrent activity recognition

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Advances in computer processing power and emerging algorithms are allowing new ways of envisioning human-computer interaction. Although the benefit of audio-visual fusion is expected for affect recognition from the psychological and engineering perspectives, most of existing approaches to automatic human affect analysis are unimodal: information processed by computer system is limited to either face images or the speech signals. This paper focuses on the development of a computing algorithm that uses both audio and visual sensors to detect and track a user's affective state to aid computer decision making. Using our multistream fused hidden Markov model (MFHMM), we analyzed coupled audio and visual streams to detect four cognitive states (interest, boredom, frustration and puzzlement) and seven prototypical emotions (neural, happiness, sadness, anger, disgust, fear and surprise). The MFHMM allows the building of an optimal connection among multiple streams according to the maximum entropy principle and the maximum mutual information criterion. Person-independent experimental results from 20 subjects in 660 sequences show that the MFHMM approach outperforms face-only HMM, pitch-only HMM, energy-only HMM, and independent HMM fusion, under clean and varying audio channel noise condition.