Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Hidden Conditional Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Social signal processing: Survey of an emerging domain
Image and Vision Computing
Dynamics of facial expression extracted automatically from video
Image and Vision Computing
A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Affective Computing
AVEC 2012: the continuous audio/visual emotion challenge - an introduction
Proceedings of the 14th ACM international conference on Multimodal interaction
Hi-index | 0.00 |
The paper presents a small empirical study into emotion and affect recognition based on auditory and visual features, which was performed in the context of the Audio-Visual Emotion Challenge (AVEC) 2012. The goal of this competition is to predict continuous-valued affect ratings based on the provided auditory and visual features, e.g., local binary pattern (LBP) features extracted from aligned face images, and spectral audio features. Empirically, we found that there are only very weak (linear) relations between the features and the continuous-valued ratings: our best linear regressors employ the offset-feature to exploit the fact that the ratings have a dominant direction (more increasing than decreasing). Much to our surprise, only exploitation of this bias already leads to results that improve over the baseline system presented in [10]. The best performance we obtained on the AVEC 2012 test set (averaged over the test set and over four affective dimensions) is a correlation between predicted and ground-truth ratings of 0.2255 when making continuous predictions, and 0.1920 when making word-level predictions.