Robust continuous prediction of human emotions using multiscale dynamic cues

  • Authors:
  • Jérémie Nicolle;Vincent Rapp;Kévin Bailly;Lionel Prevost;Mohamed Chetouani

  • Affiliations:
  • Universitè Pierre & Marie Curie, Paris, France;Universitè Pierre & Marie Curie, Paris, France;Universitè Pierre & Marie Curie, Paris, France;Universitè of French West Indies & Guiana, Guadeloupe, France;Universitè Pierre & Marie Curie, Paris, France

  • Venue:
  • Proceedings of the 14th ACM international conference on Multimodal interaction
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Designing systems able to interact with humans in a natural manner is a complex and far from solved problem. A key aspect of natural interaction is the ability to understand and appropriately respond to human emotions. This paper details our response to the Audio/Visual Emotion Challenge (AVEC'12) whose goal is to continuously predict four affective signals describing human emotions (namely valence, arousal, expectancy and power). The proposed method uses log-magnitude Fourier spectra to extract multiscale dynamic descriptions of signals characterizing global and local face appearance as well as head movements and voice. We perform a kernel regression with very few representative samples selected via a supervised weighted-distance-based clustering, that leads to a high generalization power. For selecting features, we introduce a new correlation-based measure that takes into account a possible delay between the labels and the data and significantly increases robustness. We also propose a particularly fast regressor-level fusion framework to merge systems based on different modalities. Experiments have proven the efficiency of each key point of the proposed method and we obtain very promising results.