The face speaks: contextual and temporal sensitivity to backchannel responses

Authors:
Andrew J. Aubrey;Douglas W. Cunningham;David Marshall;Paul L. Rosin;AhYoung Shin;Christian Wallraven
Affiliations:
School of Computer Science and Informatics, Cardiff University, Cardiff, UK;Brandenburg Technical University Cottbus, Germany;School of Computer Science and Informatics, Cardiff University, Cardiff, UK;School of Computer Science and Informatics, Cardiff University, Cardiff, UK;Korea University, Korea;Korea University, Korea
Venue:
ACCV'12 Proceedings of the 11th international conference on Computer Vision - Volume 2
Year:
2012

Citing 5
Cited 0

Automatic nonverbal analysis of social interaction in small groups: A review

Image and Vision Computing
Backchannel strategies for artificial listeners

IVA'10 Proceedings of the 10th international conference on Intelligent virtual agents
Backchannels: quantity, type and timing matters

IVA'11 Proceedings of the 10th international conference on Intelligent virtual agents
Experimental Design: From User Studies to Psychophysics

Experimental Design: From User Studies to Psychophysics
Bridging the Gap between Social Animal and Unsocial Machine: A Survey of Social Signal Processing

IEEE Transactions on Affective Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is often assumed that one person in a conversation is active (the speaker) and the rest passive (the listeners). Conversational analysis has shown, however, that listeners take an active part in the conversation, providing feedback signals that can control conversational flow. The face plays a vital role in these backchannel responses. A deeper understanding of facial backchannel signals is crucial for many applications in social signal processing, including automatic modeling and analysis of conversations, or in the development of life-like, effective conversational agents. Here, we present results from two experiments testing the sensitivity to the context and the timing of backchannel responses. We utilised sequences from a newly recorded database of 5-minute, two-person conversations. Experiment 1 tested how well participants would be able to match backchannel sequences to their corresponding speaker sequence. On average, participants performed well above chance. Experiment 2 tested how sensitive participants would be to temporal misalignments of the backchannel sequence. Interestingly, participants were able to estimate the correct temporal alignment for the sequence pairs. Taken together, our results show that human conversational skills are highly tuned both towards context and temporal alignment, showing the need for accurate modeling of conversations in social signal processing.