When Human Coders (and Machines) Disagree on the Meaning of Facial Affect in Spontaneous Videos

Authors:
Mohammed E. Hoque;Rana Kaliouby;Rosalind W. Picard
Affiliations:
Media Laboratory, Massachusetts Intitute of Technology, Cambridge 02142;Media Laboratory, Massachusetts Intitute of Technology, Cambridge 02142;Media Laboratory, Massachusetts Intitute of Technology, Cambridge 02142
Venue:
IVA '09 Proceedings of the 9th International Conference on Intelligent Virtual Agents
Year:
2009

Citing 0
Cited 3

The Rovereto Emotion and Cooperation Corpus: a new resource to investigate cooperation and emotions

Language Resources and Evaluation
Automatic natural expression recognition using head movement and skin color features

Proceedings of the International Working Conference on Advanced Visual Interfaces
A dynamic approach for detecting naturalistic affective states from facial videos during HCI

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the challenges of getting ground truth affective labels for spontaneous video, and presents implications for systems such as virtual agents that have automated facial analysis capabilities. We first present a dataset from an intelligent tutoring application and describe the most prevalent approach to labeling such data. We then present an alternative labeling approach, which closely models how the majority of automated facial analysis systems are designed. We show that while participants, peers and trained judges report high inter-rater agreement on expressions of delight, confusion, flow, frustration, boredom, surprise, and neutral when shown the entire 30 minutes of video for each participant, inter-rater agreement drops below chance when human coders are asked to watch and label short 8 second clips for the same set of labels. We also perform discriminative analysis for facial action units for each affective state represented in the clips. The results emphasize that human coders heavily rely on factors such as familiarity of the person and context of the interaction to correctly infer a person's affective state; without this information, the reliability of humans as well as machines attributing affective labels to spontaneous facial-head movements drops significantly.