Recognizing affect from speech prosody using hierarchical graphical models
Speech Communication
Audio visual emotion recognition based on triple-stream dynamic bayesian network models
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Hi-index | 0.01 |
This paper presents an audio visual multi-stream DBN model (Asy_DBN) for emotion recognition with constraint asynchrony, in which audio state and visual state transit individually in their corresponding stream but the transition is constrained by the allowed maximum audio visual asynchrony. Emotion recognition experiments of Asy_DBN with different asynchrony constraints are carried out on an audio visual speech database of four emotions, and compared with the single stream HMM, state synchronous HMM (Syn_HMM) and state synchronous DBN model, as well the state asynchronous DBN model without asynchrony constraint. Results show that by setting the appropriate maximum asynchrony constraint between audio and visual streams, the proposed audio visual asynchronous DBN model gets the highest emotion recognition performance, with an improvement of 15% over Syn_HMM.