When Human Coders (and Machines) Disagree on the Meaning of Facial Affect in Spontaneous Videos

  • Authors:
  • Mohammed E. Hoque;Rana Kaliouby;Rosalind W. Picard

  • Affiliations:
  • Media Laboratory, Massachusetts Intitute of Technology, Cambridge 02142;Media Laboratory, Massachusetts Intitute of Technology, Cambridge 02142;Media Laboratory, Massachusetts Intitute of Technology, Cambridge 02142

  • Venue:
  • IVA '09 Proceedings of the 9th International Conference on Intelligent Virtual Agents
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the challenges of getting ground truth affective labels for spontaneous video, and presents implications for systems such as virtual agents that have automated facial analysis capabilities. We first present a dataset from an intelligent tutoring application and describe the most prevalent approach to labeling such data. We then present an alternative labeling approach, which closely models how the majority of automated facial analysis systems are designed. We show that while participants, peers and trained judges report high inter-rater agreement on expressions of delight, confusion, flow, frustration, boredom, surprise, and neutral when shown the entire 30 minutes of video for each participant, inter-rater agreement drops below chance when human coders are asked to watch and label short 8 second clips for the same set of labels. We also perform discriminative analysis for facial action units for each affective state represented in the clips. The results emphasize that human coders heavily rely on factors such as familiarity of the person and context of the interaction to correctly infer a person's affective state; without this information, the reliability of humans as well as machines attributing affective labels to spontaneous facial-head movements drops significantly.