Automatic Analysis of Facial Expressions: The State of the Art
IEEE Transactions on Pattern Analysis and Machine Intelligence
Active and Dynamic Information Fusion for Facial Expression Understanding from Image Sequences
IEEE Transactions on Pattern Analysis and Machine Intelligence
Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions
IEEE Transactions on Pattern Analysis and Machine Intelligence
Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations
IEEE Transactions on Pattern Analysis and Machine Intelligence
Grassmann discriminant analysis: a unifying view on subspace-based learning
Proceedings of the 25th international conference on Machine learning
Kernel Grassmannian distances and discriminant analysis for face recognition from image sets
Pattern Recognition Letters
Opensmile: the munich versatile and fast open-source audio feature extractor
Proceedings of the international conference on Multimedia
A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Overview and recent advances in partial least squares
SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Covariance discriminative learning: A natural and efficient approach to image set classification
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Emotion recognition in the wild challenge 2013
Proceedings of the 15th ACM on International conference on multimodal interaction
Hi-index | 0.00 |
In this paper, we propose a method for video-based human emotion recognition. For each video clip, all frames are represented as an image set, which can be modeled as a linear subspace to be embedded in Grassmannian manifold. After feature extraction, Class-specific One-to-Rest Partial Least Squares (PLS) is learned on video and audio data respectively to distinguish each class from the other confusing ones. Finally, an optimal fusion of classifiers learned from both modalities (video and audio) is conducted at decision level. Our method is evaluated on the Emotion Recognition In The Wild Challenge (EmotiW 2013). The experimental results on both validation set and blind test set are presented for comparison. The final accuracy achieved on test set outperforms the baseline by 26%.