Audio visual emotion recognition based on triple-stream dynamic bayesian network models

  • Authors:
  • Dongmei Jiang;Yulu Cui;Xiaojing Zhang;Ping Fan;Isabel Ganzalez;Hichem Sahli

  • Affiliations:
  • VUB-NPU Joint Research Group on AVSP, Northwestern Polytechnic University, Xi'an, China and Shaanxi Provincial Key Laboratory on Speech, Image and Information Processing;VUB-NPU Joint Research Group on AVSP, Northwestern Polytechnic University, Xi'an, China and Shaanxi Provincial Key Laboratory on Speech, Image and Information Processing;VUB-NPU Joint Research Group on AVSP, Northwestern Polytechnic University, Xi'an, China and Shaanxi Provincial Key Laboratory on Speech, Image and Information Processing;Vrije Universiteit Brussel - AVSP, Department ETRO;Vrije Universiteit Brussel - AVSP, Department ETRO;Vrije Universiteit Brussel - AVSP, Department ETRO and Interuniversity Microelectronics Centre - IMEC, Brussels, Belgium

  • Venue:
  • ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a triple stream DBN model (T_AsyDBN) for audio visual emotion recognition, in which the two audio feature streams are synchronous, while they are asynchronous with the visual feature stream within controllable constraints. MFCC features and the principle component analysis (PCA) coefficients of local prosodic features are used for the audio streams. For the visual stream, 2D facial features as well 3D facial animation unit features are defined and concatenated, and the feature dimensions are reduced by PCA. Emotion recognition experiments on the eNERFACE'05 database show that by adjusting the asynchrony constraint, the proposed T_AsyDBN model obtains 18.73% higher correction rate than the traditional multi-stream state synchronous HMM (MSHMM), and 10.21% higher than the two stream asynchronous DBN model (Asy_DBN).