A robust multimodal approach for emotion recognition

Authors:
Mingli Song;Mingyu You;Na Li;Chun Chen
Affiliations:
College of Computer Science, Zhejiang University, China;College of Computer Science, Zhejiang University, China;College of Computer Science, Zhejiang University, China;College of Computer Science, Zhejiang University, China
Venue:
Neurocomputing
Year:
2008

Citing 16
Cited 5

Active shape models—their training and application

Computer Vision and Image Understanding
Subtly Different Facial Expression Recognition and Expression Intensity Estimation

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Eigen-points: Control-point Location using Principle Component Analyses

FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)
Bimodal Emotion Recognition

FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
Facial Expression Space Learning

PG '02 Proceedings of the 10th Pacific Conference on Computer Graphics and Applications
Facial expression recognition from video sequences: temporal and static modeling

Computer Vision and Image Understanding - Special issue on Face recognition
Audio-Visual Affect Recognition through Multi-Stream Fused HMM for HCI

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
2005 Special Issue: Emotion recognition in human-computer interaction

Neural Networks - Special issue: Emotion and brain
Emotional Speech Analysis on Nonlinear Manifold

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 03
Audio-visual emotion recognition in adult attachment interview

Proceedings of the 8th international conference on Multimodal interfaces
Supervised tensor learning

Knowledge and Information Systems
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Audio-visual based emotion recognition-a new approach

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Learning bayesian network classifiers for facial expression recognition using both labeled and unlabeled data

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
Sketch based facial expression recognition using graphics hardware

ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
The facial animation engine: toward a high-level interface for the design of MPEG-4 compliant animated faces

IEEE Transactions on Circuits and Systems for Video Technology

Discriminative Locality Alignment

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Multimodal information fusion application to human emotion recognition from face and speech

Multimedia Tools and Applications
Semi-coupled hidden Markov model with state-based alignment strategy for audio-visual emotion recognition

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Audio visual emotion recognition based on triple-stream dynamic bayesian network models

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Exploiting Psychological Factors for Interaction Style Recognition in Spoken Conversation

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.02

Visualization

Abstract

Emotion recognition is one of the latest challenges in intelligent human/computer communication. Most of previous work on emotion recognition focused on extracting emotions from visual or audio information separately. A novel approach is presented in this paper, including both visual and audio from video clips, to recognize the human emotion. The Facial Animation Parameters (FAPs) compliant facial feature tracking based on GASM (GPU based Active Shape Model) is performed on the video to generate two vector streams which represent the expression feature and the visual speech one. To extract effective speech features, based on geodesic distance estimation, we develop an enhanced Lipschitz embedding to embed high dimensional acoustic features into low dimensional space. Combined with the visual vectors, the audio vector is extracted in terms of low dimensional features. Then, a tripled Hidden Markov Model is introduced to perform the recognition which allows the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. The experimental results show that this approach outperforms the conventional approaches for emotion recognition.