Exploring bag of words architectures in the facial expression domain

  • Authors:
  • Karan Sikka;Tingfan Wu;Josh Susskind;Marian Bartlett

  • Affiliations:
  • Machine Perception Laboratory, University of California San Diego;Machine Perception Laboratory, University of California San Diego;Machine Perception Laboratory, University of California San Diego;Machine Perception Laboratory, University of California San Diego

  • Venue:
  • ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume 2
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic facial expression recognition (AFER) has undergone substantial advancement over the past two decades. This work explores the application of bag of words (BoW), a highly matured approach for object and scene recognition to AFER. We proceed by first highlighting the reasons that makes the task for BoW differ for AFER compared to object and scene recognition. We propose suitable extensions to BoW architecture for the AFER's task. These extensions are able to address some of the limitations of current state of the art appearance-based approaches to AFER. Our BoW architecture is based on the spatial pyramid framework, augmented by multiscale dense SIFT features, and a recently proposed approach for object classification: locality-constrained linear coding and max-pooling. Combining these, we are able to achieve a powerful facial representation that works well even with linear classifiers. We show that a well designed BoW architecture can provide a performance benefit for AFER, and elements of the proposed BoW architecture are empirically evaluated. The proposed BoW approach supersedes previous state of the art results by achieving an average recognition rate of 96% on AFER for two public datasets.