Disentangling factors of variation for facial expression recognition

  • Authors:
  • Salah Rifai;Yoshua Bengio;Aaron Courville;Pascal Vincent;Mehdi Mirza

  • Affiliations:
  • Department of Computer Science and Operations Research, Université de Montréal, Canada;Department of Computer Science and Operations Research, Université de Montréal, Canada;Department of Computer Science and Operations Research, Université de Montréal, Canada;Department of Computer Science and Operations Research, Université de Montréal, Canada;Department of Computer Science and Operations Research, Université de Montréal, Canada

  • Venue:
  • ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a semi-supervised approach to solve the task of emotion recognition in 2D face images using recent ideas in deep learning for handling the factors of variation present in data. An emotion classification algorithm should be both robust to (1) remaining variations due to the pose of the face in the image after centering and alignment, (2) the identity or morphology of the face. In order to achieve this invariance, we propose to learn a hierarchy of features in which we gradually filter the factors of variation arising from both (1) and (2). We address (1) by using a multi-scale contractive convolutional network (CCNET) in order to obtain invariance to translations of the facial traits in the image. Using the feature representation produced by the CCNET, we train a Contractive Discriminative Analysis (CDA) feature extractor, a novel variant of the Contractive Auto-Encoder (CAE), designed to learn a representation separating out the emotion-related factors from the others (which mostly capture the subject identity, and what is left of pose after the CCNET). This system beats the state-of-the-art on a recently proposed dataset for facial expression recognition, the Toronto Face Database, moving the state-of-art accuracy from 82.4% to 85.0%, while the CCNET and CDA improve accuracy of a standard CAE by 8%.