Towards a dynamic expression recognition system under facial occlusion

  • Authors:
  • Xiaohua Huang;Guoying Zhao;Wenming Zheng;Matti PietikäInen

  • Affiliations:
  • Center for Machine Vision Research, Department of Computer Science and Engineering, University of Oulu, Oulu, 90014, Finland and Key Laboratory of Child Development and Learning Science (Ministry ...;Center for Machine Vision Research, Department of Computer Science and Engineering, University of Oulu, Oulu, 90014, Finland;Key Laboratory of Child Development and Learning Science (Ministry of Education), Southeast University, Nanjing, JiangSu 210096, China;Center for Machine Vision Research, Department of Computer Science and Engineering, University of Oulu, Oulu, 90014, Finland

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2012

Quantified Score

Hi-index 0.10

Visualization

Abstract

Facial occlusion is a challenging research topic in facial expression recognition (FER). This has resulted in the need to develop some interesting facial representations and occlusion detection methods in order to extend the FER to uncontrolled environments. It should be noted that most of the previous work focuses on these two issues separately, and on static images. We are thus motivated to propose a complete system consisting of facial representations, occlusion detection, and multiple feature fusion in video sequences. For achieving a robust facial representation, we propose an approach deriving six feature vectors from eyes, nose and mouth components to form a facial representation. These features with temporal cues are generated by the dynamic texture and structural shape feature descriptors. On the other hand, occlusion detection is still mainly realized by the traditional classifiers or model comparison. Recently, sparse representation has been proposed as an efficient method against occlusion, while it is correlated with facial identity in FER, unless using an appropriate facial representation. Thus, we present an evaluation demonstrating that the proposed facial representation is independent of facial identity. Inspired by Mercier et al. (2007), we then exploit the use of the sparse representation and residual statistics to occlusion detection of the image sequences. As concatenating six feature vectors into one causes the curse of dimensionality, we propose multiple feature fusion consisting of fusion module and weight learning. Experimental results on the Extended Cohn-Kanade database and its simulated database demonstrate that our framework outperforms the state-of-the-art methods for FER in normal videos, and especially, in partial occlusion videos.