Context-based recognition during human interactions: automatic feature selection and encoding dictionary

Authors:
Louis-Philippe Morency;Iwan de Kok;Jonathan Gratch
Affiliations:
USC Institute for Creative Technologies, Marina del Rey, CA, USA;University of Twente, Enschede, Netherlands;USC Institute for Creative Technologies, Marina del Rey, CA, USA
Venue:
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Year:
2008

Citing 10
Cited 10

Real-Time Detection of Nodding and Head-Shaking by Directly Detecting and Tracking the "Between-Eyes"

FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
Context-based vision system for place and object recognition

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality

Proceedings of the 5th international conference on Multimodal interfaces
Hand motion gestural oscillations and multimodal discourse

Proceedings of the 5th international conference on Multimodal interfaces
A real-time head nod and shake detector

Proceedings of the 2001 workshop on Perceptive user interfaces
A shallow model of backchannel continuers in spoken dialogue

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Recognizing gaze aversion gestures in embodied conversational discourse

Proceedings of the 8th international conference on Multimodal interfaces
Head gestures for perceptual interfaces: The role of context in improving recognition

Artificial Intelligence
A spoken dialog system for chat-like conversations considering response timing

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Conditional sequence model for context-based recognition of gaze aversion

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction

Multimodal end-of-turn prediction in multi-party meetings

Proceedings of the 2009 international conference on Multimodal interfaces
Detecting user engagement with a robot companion using task and social interaction-based features

Proceedings of the 2009 international conference on Multimodal interfaces
A probabilistic multimodal approach for predicting listener backchannels

Autonomous Agents and Multi-Agent Systems
Co-occurrence graphs: contextual representation for head gesture recognition during multi-party interactions

Proceedings of the Workshop on Use of Context in Vision Processing
Concensus of self-features for nonverbal behavior analysis

HBU'10 Proceedings of the First international conference on Human behavior understanding
Structural and temporal inference search (STIS): pattern identification in multimodal data

Proceedings of the 14th ACM international conference on Multimodal interaction
Using self-context for multimodal detection of head nods in face-to-face interactions

Proceedings of the 14th ACM international conference on Multimodal interaction
Learning speaker, addressee and overlap detection models from multimodal streams

Proceedings of the 14th ACM international conference on Multimodal interaction
Interactive relevance search and modeling: support for expert-driven analysis of multimodal data

Proceedings of the 15th ACM on International conference on multimodal interaction
Context-based conversational hand gesture classification in narrative interaction

Proceedings of the 15th ACM on International conference on multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

During face-to-face conversation, people use visual feedback such as head nods to communicate relevant information and to synchronize rhythm between participants. In this paper we describe how contextual information from other participants can be used to predict visual feedback and improve recognition of head gestures in human-human interactions. For example, in a dyadic interaction, the speaker contextual cues such as gaze shifts or changes in prosody will influence listener backchannel feedback (e.g., head nod). To automatically learn how to integrate this contextual information into the listener gesture recognition framework, this paper addresses two main challenges: optimal feature representation using an encoding dictionary and automatic selection of optimal feature-encoding pairs. Multimodal integration between context and visual observations is performed using a discriminative sequential model (Latent-Dynamic Conditional Random Fields) trained on previous interactions. In our experiments involving 38 storytelling dyads, our context-based recognizer significantly improved head gesture recognition performance over a vision-only recognizer.