Audio-visual emotion challenge 2012: a simple approach

  • Authors:
  • Laurens van der Maaten

  • Affiliations:
  • Delft University of Technology, Delft, Netherlands

  • Venue:
  • Proceedings of the 14th ACM international conference on Multimodal interaction
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper presents a small empirical study into emotion and affect recognition based on auditory and visual features, which was performed in the context of the Audio-Visual Emotion Challenge (AVEC) 2012. The goal of this competition is to predict continuous-valued affect ratings based on the provided auditory and visual features, e.g., local binary pattern (LBP) features extracted from aligned face images, and spectral audio features. Empirically, we found that there are only very weak (linear) relations between the features and the continuous-valued ratings: our best linear regressors employ the offset-feature to exploit the fact that the ratings have a dominant direction (more increasing than decreasing). Much to our surprise, only exploitation of this bias already leads to results that improve over the baseline system presented in [10]. The best performance we obtained on the AVEC 2012 test set (averaged over the test set and over four affective dimensions) is a correlation between predicted and ground-truth ratings of 0.2255 when making continuous predictions, and 0.1920 when making word-level predictions.