Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space

Authors:
Mihalis A. Nicolaou;Hatice Gunes;Maja Pantic
Affiliations:
Imperial College London, London;Imperial College London, London;Imperial College London, London and University of Twente, The Netherlands
Venue:
IEEE Transactions on Affective Computing
Year:
2011

Citing 0
Cited 23

Naturalistic affective expression classification by a multi-stage approach based on hidden Markov models

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Investigating glottal parameters and teager energy operators in emotion recognition

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Output-associative RVM regression for dimensional and continuous emotion prediction

Image and Vision Computing
AffectAura: an intelligent system for emotional memory

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Automatic natural expression recognition using head movement and skin color features

Proceedings of the International Working Conference on Advanced Visual Interfaces
Speech-based recognition of self-reported and observed emotion in a dimensional space

Speech Communication
Static and dynamic 3D facial expression recognition: A comprehensive survey

Image and Vision Computing
Consistent but modest: a meta-analysis on unimodal and multimodal affect detection accuracies from 30 studies

Proceedings of the 14th ACM international conference on Multimodal interaction
A real-time, multimodal, and dimensional affect recognition system

PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence
Dynamic probabilistic CCA for analysis of affective behaviour

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VII
A multimodal approach for online estimation of subtle facial expression

PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework

Image and Vision Computing
Categorical and dimensional affect analysis in continuous input: Current trends and future directions

Image and Vision Computing
Towards the automatic detection of spontaneous agreement and disagreement based on nonverbal behaviour: A survey of related cues, databases, and tools

Image and Vision Computing
Correlated-spaces regression for learning continuous emotion dimensions

Proceedings of the 21st ACM international conference on Multimedia
Audiovisual three-level fusion for continuous estimation of Russell's emotion circumplex

Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge
Diagnosis of depression by behavioural signals: a multimodal approach

Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge
Towards in situ affect detection in mobile devices: a multimodal approach

Proceedings of the 2013 Research in Adaptive and Convergent Systems
"Moon Phrases": a social media faciliated tool for emotional reflection and wellness

Proceedings of the 7th International Conference on Pervasive Computing Technologies for Healthcare
Shape-based modeling of the fundamental frequency contour for emotion detection in speech

Computer Speech and Language
Continuous emotion recognition with phonetic syllables

Speech Communication
In situ affect detection in mobile devices: a multimodal approach for advertisement using social network

ACM SIGAPP Applied Computing Review
Probabilistic speech feature extraction with context-sensitive Bottleneck neural networks

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Past research in analysis of human affect has focused on recognition of prototypic expressions of six basic emotions based on posed data acquired in laboratory settings. Recently, there has been a shift toward subtle, continuous, and context-specific interpretations of affective displays recorded in naturalistic and real-world settings, and toward multimodal analysis and recognition of human affect. Converging with this shift, this paper presents, to the best of our knowledge, the first approach in the literature that: 1) fuses facial expression, shoulder gesture, and audio cues for dimensional and continuous prediction of emotions in valence and arousal space, 2) compares the performance of two state-of-the-art machine learning techniques applied to the target problem, the bidirectional Long Short-Term Memory neural networks (BLSTM-NNs), and Support Vector Machines for Regression (SVR), and 3) proposes an output-associative fusion framework that incorporates correlations and covariances between the emotion dimensions. Evaluation of the proposed approach has been done using the spontaneous SAL data from four subjects and subject-dependent leave-one-sequence-out cross validation. The experimental results obtained show that: 1) on average, BLSTM-NNs outperform SVR due to their ability to learn past and future context, 2) the proposed output-associative fusion framework outperforms feature-level and model-level fusion by modeling and learning correlations and patterns between the valence and arousal dimensions, and 3) the proposed system is well able to reproduce the valence and arousal ground truth obtained from human coders.