ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Investigating glottal parameters and teager energy operators in emotion recognition
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Output-associative RVM regression for dimensional and continuous emotion prediction
Image and Vision Computing
AffectAura: an intelligent system for emotional memory
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Automatic natural expression recognition using head movement and skin color features
Proceedings of the International Working Conference on Advanced Visual Interfaces
Static and dynamic 3D facial expression recognition: A comprehensive survey
Image and Vision Computing
Proceedings of the 14th ACM international conference on Multimodal interaction
A real-time, multimodal, and dimensional affect recognition system
PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence
Dynamic probabilistic CCA for analysis of affective behaviour
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VII
A multimodal approach for online estimation of subtle facial expression
PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework
Image and Vision Computing
Image and Vision Computing
Correlated-spaces regression for learning continuous emotion dimensions
Proceedings of the 21st ACM international conference on Multimedia
Audiovisual three-level fusion for continuous estimation of Russell's emotion circumplex
Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge
Diagnosis of depression by behavioural signals: a multimodal approach
Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge
Towards in situ affect detection in mobile devices: a multimodal approach
Proceedings of the 2013 Research in Adaptive and Convergent Systems
"Moon Phrases": a social media faciliated tool for emotional reflection and wellness
Proceedings of the 7th International Conference on Pervasive Computing Technologies for Healthcare
Shape-based modeling of the fundamental frequency contour for emotion detection in speech
Computer Speech and Language
Continuous emotion recognition with phonetic syllables
Speech Communication
ACM SIGAPP Applied Computing Review
Hi-index | 0.00 |
Past research in analysis of human affect has focused on recognition of prototypic expressions of six basic emotions based on posed data acquired in laboratory settings. Recently, there has been a shift toward subtle, continuous, and context-specific interpretations of affective displays recorded in naturalistic and real-world settings, and toward multimodal analysis and recognition of human affect. Converging with this shift, this paper presents, to the best of our knowledge, the first approach in the literature that: 1) fuses facial expression, shoulder gesture, and audio cues for dimensional and continuous prediction of emotions in valence and arousal space, 2) compares the performance of two state-of-the-art machine learning techniques applied to the target problem, the bidirectional Long Short-Term Memory neural networks (BLSTM-NNs), and Support Vector Machines for Regression (SVR), and 3) proposes an output-associative fusion framework that incorporates correlations and covariances between the emotion dimensions. Evaluation of the proposed approach has been done using the spontaneous SAL data from four subjects and subject-dependent leave-one-sequence-out cross validation. The experimental results obtained show that: 1) on average, BLSTM-NNs outperform SVR due to their ability to learn past and future context, 2) the proposed output-associative fusion framework outperforms feature-level and model-level fusion by modeling and learning correlations and patterns between the valence and arousal dimensions, and 3) the proposed system is well able to reproduce the valence and arousal ground truth obtained from human coders.