Context-based recognition during human interactions: automatic feature selection and encoding dictionary

  • Authors:
  • Louis-Philippe Morency;Iwan de Kok;Jonathan Gratch

  • Affiliations:
  • USC Institute for Creative Technologies, Marina del Rey, CA, USA;University of Twente, Enschede, Netherlands;USC Institute for Creative Technologies, Marina del Rey, CA, USA

  • Venue:
  • ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

During face-to-face conversation, people use visual feedback such as head nods to communicate relevant information and to synchronize rhythm between participants. In this paper we describe how contextual information from other participants can be used to predict visual feedback and improve recognition of head gestures in human-human interactions. For example, in a dyadic interaction, the speaker contextual cues such as gaze shifts or changes in prosody will influence listener backchannel feedback (e.g., head nod). To automatically learn how to integrate this contextual information into the listener gesture recognition framework, this paper addresses two main challenges: optimal feature representation using an encoding dictionary and automatic selection of optimal feature-encoding pairs. Multimodal integration between context and visual observations is performed using a discriminative sequential model (Latent-Dynamic Conditional Random Fields) trained on previous interactions. In our experiments involving 38 storytelling dyads, our context-based recognizer significantly improved head gesture recognition performance over a vision-only recognizer.