Audio visual emotion recognition based on triple-stream dynamic bayesian network models

Authors:
Dongmei Jiang;Yulu Cui;Xiaojing Zhang;Ping Fan;Isabel Ganzalez;Hichem Sahli
Affiliations:
VUB-NPU Joint Research Group on AVSP, Northwestern Polytechnic University, Xi'an, China and Shaanxi Provincial Key Laboratory on Speech, Image and Information Processing;VUB-NPU Joint Research Group on AVSP, Northwestern Polytechnic University, Xi'an, China and Shaanxi Provincial Key Laboratory on Speech, Image and Information Processing;VUB-NPU Joint Research Group on AVSP, Northwestern Polytechnic University, Xi'an, China and Shaanxi Provincial Key Laboratory on Speech, Image and Information Processing;Vrije Universiteit Brussel - AVSP, Department ETRO;Vrije Universiteit Brussel - AVSP, Department ETRO;Vrije Universiteit Brussel - AVSP, Department ETRO and Interuniversity Microelectronics Centre - IMEC, Brussels, Belgium
Venue:
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Year:
2011

Citing 9
Cited 1

Analysis of emotion recognition using facial expressions, speech and multimodal information

Proceedings of the 6th international conference on Multimodal interfaces
The eNTERFACE'05 Audio-Visual Emotion Database

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
3D Facial Expression Recognition Based on Primitive Surface Feature Distribution

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Audio-visual emotion recognition in adult attachment interview

Proceedings of the 8th international conference on Multimodal interfaces
Audiovisual recognition of spontaneous interest within conversations

Proceedings of the 9th international conference on Multimodal interfaces
A robust multimodal approach for emotion recognition

Neurocomputing
Audio-Visual Emotion Recognition Based on a DBN Model with Constrained Asynchrony

ICIG '09 Proceedings of the 2009 Fifth International Conference on Image and Graphics
Robust shape-based head tracking

ACIVS'07 Proceedings of the 9th international conference on Advanced concepts for intelligent vision systems
Audio–Visual Affective Expression Recognition Through Multistream Fused HMM

IEEE Transactions on Multimedia

Consistent but modest: a meta-analysis on unimodal and multimodal affect detection accuracies from 30 studies

Proceedings of the 14th ACM international conference on Multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a triple stream DBN model (T_AsyDBN) for audio visual emotion recognition, in which the two audio feature streams are synchronous, while they are asynchronous with the visual feature stream within controllable constraints. MFCC features and the principle component analysis (PCA) coefficients of local prosodic features are used for the audio streams. For the visual stream, 2D facial features as well 3D facial animation unit features are defined and concatenated, and the feature dimensions are reduced by PCA. Emotion recognition experiments on the eNERFACE'05 database show that by adjusting the asynchrony constraint, the proposed T_AsyDBN model obtains 18.73% higher correction rate than the traditional multi-stream state synchronous HMM (MSHMM), and 10.21% higher than the two stream asynchronous DBN model (Asy_DBN).