The face speaks: contextual and temporal sensitivity to backchannel responses

  • Authors:
  • Andrew J. Aubrey;Douglas W. Cunningham;David Marshall;Paul L. Rosin;AhYoung Shin;Christian Wallraven

  • Affiliations:
  • School of Computer Science and Informatics, Cardiff University, Cardiff, UK;Brandenburg Technical University Cottbus, Germany;School of Computer Science and Informatics, Cardiff University, Cardiff, UK;School of Computer Science and Informatics, Cardiff University, Cardiff, UK;Korea University, Korea;Korea University, Korea

  • Venue:
  • ACCV'12 Proceedings of the 11th international conference on Computer Vision - Volume 2
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is often assumed that one person in a conversation is active (the speaker) and the rest passive (the listeners). Conversational analysis has shown, however, that listeners take an active part in the conversation, providing feedback signals that can control conversational flow. The face plays a vital role in these backchannel responses. A deeper understanding of facial backchannel signals is crucial for many applications in social signal processing, including automatic modeling and analysis of conversations, or in the development of life-like, effective conversational agents. Here, we present results from two experiments testing the sensitivity to the context and the timing of backchannel responses. We utilised sequences from a newly recorded database of 5-minute, two-person conversations. Experiment 1 tested how well participants would be able to match backchannel sequences to their corresponding speaker sequence. On average, participants performed well above chance. Experiment 2 tested how sensitive participants would be to temporal misalignments of the backchannel sequence. Interestingly, participants were able to estimate the correct temporal alignment for the sequence pairs. Taken together, our results show that human conversational skills are highly tuned both towards context and temporal alignment, showing the need for accurate modeling of conversations in social signal processing.