Opinion Mining and Sentiment Analysis
Foundations and Trends in Information Retrieval
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions
IEEE Transactions on Pattern Analysis and Machine Intelligence
Class-level spectral features for emotion recognition
Speech Communication
From frequency to meaning: vector space models of semantics
Journal of Artificial Intelligence Research
Opensmile: the munich versatile and fast open-source audio feature extractor
Proceedings of the international conference on Multimedia
A computer model of the interpersonal effect of emotion displayed in a social dilemma
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Multiple classifier systems for the classificatio of audio-visual emotional states
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
AVEC 2011-the first international audio/visual emotion challenge
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Regression-based intensity estimation of facial action units
Image and Vision Computing
AVEC 2012: the continuous audio/visual emotion challenge
Proceedings of the 14th ACM international conference on Multimodal interaction
Audiovisual three-level fusion for continuous estimation of Russell's emotion circumplex
Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge
Hi-index | 0.00 |
We present experiments on fusing facial video, audio and lexical indicators for affect estimation during dyadic conversations. We use temporal statistics of texture descriptors extracted from facial video, a combination of various acoustic features, and lexical features to create regression based affect estimators for each modality. The single modality regressors are then combined using particle filtering, by treating these independent regression outputs as measurements of the affect states in a Bayesian filtering framework, where previous observations provide prediction about the current state by means of learned affect dynamics. Tested on the Audio-visual Emotion Recognition Challenge dataset, our single modality estimators achieve substantially higher scores than the official baseline method for every dimension of affect. Our filtering-based multi-modality fusion achieves correlation performance of 0.344 (baseline: 0.136) and 0.280 (baseline: 0.096) for the fully continuous and word level sub challenges, respectively.