Audio-visual emotion challenge 2012: a simple approach

Authors:
Laurens van der Maaten
Affiliations:
Delft University of Technology, Delft, Netherlands
Venue:
Proceedings of the 14th ACM international conference on Multimodal interaction
Year:
2012

Citing 9
Cited 0

Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Hidden Conditional Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Social signal processing: Survey of an emerging domain

Image and Vision Computing
Dynamics of facial expression extracted automatically from video

Image and Vision Computing
A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent

IEEE Transactions on Affective Computing
AVEC 2012: the continuous audio/visual emotion challenge - an introduction

Proceedings of the 14th ACM international conference on Multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper presents a small empirical study into emotion and affect recognition based on auditory and visual features, which was performed in the context of the Audio-Visual Emotion Challenge (AVEC) 2012. The goal of this competition is to predict continuous-valued affect ratings based on the provided auditory and visual features, e.g., local binary pattern (LBP) features extracted from aligned face images, and spectral audio features. Empirically, we found that there are only very weak (linear) relations between the features and the continuous-valued ratings: our best linear regressors employ the offset-feature to exploit the fact that the ratings have a dominant direction (more increasing than decreasing). Much to our surprise, only exploitation of this bias already leads to results that improve over the baseline system presented in [10]. The best performance we obtained on the AVEC 2012 test set (averaged over the test set and over four affective dimensions) is a correlation between predicted and ground-truth ratings of 0.2255 when making continuous predictions, and 0.1920 when making word-level predictions.