Contextual recognition of head gestures

Authors:
Louis-Philippe Morency;Candace Sidner;Christopher Lee;Trevor Darrell
Affiliations:
Massachussetts Institute of Technology, Cambridge, MA;Mitsubishi Electric Research Laboratories, Cambridge, MA;Mitsubishi Electric Research Laboratories, Cambridge, MA;Massachussetts Institute of Technology, Cambridge, MA
Venue:
ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Year:
2005

Citing 13
Cited 46

Embodied agents for multi-party dialogue in immersive virtual worlds

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2
Collagen: applying collaborative discourse theory to human-computer interaction

AI Magazine
Tracking Focus of Attention in Meetings

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Fast Stereo-Based Head Tracking for Interactive Environments

FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition
Context-based vision system for place and object recognition

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
A multi-modal approach for determining speaker location and focus

Proceedings of the 5th international conference on Multimodal interfaces
A real-time head nod and shake detector

Proceedings of the 2001 workshop on Perceptive user interfaces
Impact of video editing based on participants' gaze in multiparty conversation

CHI '04 Extended Abstracts on Human Factors in Computing Systems
Teaching and Working with Robots as a Collaboration

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Towards a model of face-to-face grounding

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Explorations in engagement for humans and robots

Artificial Intelligence
Behavior planning for a reflexive agent

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Adaptive view-based appearance models

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition

Head gesture recognition in intelligent interfaces: the role of context in improving recognition

Proceedings of the 11th international conference on Intelligent user interfaces
The effect of head-nod recognition in human-robot conversation

Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction
Recognizing gaze aversion gestures in embodied conversational discourse

Proceedings of the 8th international conference on Multimodal interfaces
Hidden Conditional Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic inference of cross-modal nonverbal interactions in multiparty conversations: "who responds to whom, when, and how?" from gaze, head gestures, and utterances

Proceedings of the 9th international conference on Multimodal interfaces
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces

HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces
Does the contingency of agents' nonverbal feedback affect users' social anxiety?

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Creating Rapport with Virtual Agents

IVA '07 Proceedings of the 7th international conference on Intelligent Virtual Agents
Fluid Semantic Back-Channel Feedback in Dialogue: Challenges and Progress

IVA '07 Proceedings of the 7th international conference on Intelligent Virtual Agents
Uses of Contextual Knowledge in Mobile Robots

AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Predicting Listener Backchannels: A Probabilistic Multimodal Approach

IVA '08 Proceedings of the 8th international conference on Intelligent Virtual Agents
Agreeable People Like Agreeable Virtual Humans

IVA '08 Proceedings of the 8th international conference on Intelligent Virtual Agents
The effect of affective iconic realism on anonymous interactants' self-disclosure

CHI '09 Extended Abstracts on Human Factors in Computing Systems
A Context Model and Reasoning System to improve object trackingin complex scenarios

Expert Systems with Applications: An International Journal
Greta: an interactive expressive ECA system

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
From the programmer's apprentice to human-robot interaction: thirty years of research on human-computer collaboration

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Virtual humans

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
The role of context in head gesture recognition

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Keyboard before Head Tracking Depresses User Success in Remote Camera Control

INTERACT '09 Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part II
A probabilistic multimodal approach for predicting listener backchannels

Autonomous Agents and Multi-Agent Systems
Co-occurrence graphs: contextual representation for head gesture recognition during multi-party interactions

Proceedings of the Workshop on Use of Context in Vision Processing
Don't just stare at me!

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The effect of avatar realism of virtual humans on self-disclosure in anonymous social interactions

CHI '10 Extended Abstracts on Human Factors in Computing Systems
Can virtual humans be more engaging than real ones?

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments
Conditional sequence model for context-based recognition of gaze aversion

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
True emotion vs. social intentions in nonverbal communication: towards a synthesis for embodied conversational agents

ZiF'06 Proceedings of the Embodied communication in humans and machines, 2nd ZiF research group international conference on Modeling communication with robots and virtual humans
Classification of feedback expressions in multimodal data

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Spatiotemporal-boosted DCT features for head and face gesture analysis

HBU'10 Proceedings of the First international conference on Human behavior understanding
Learning backchannel prediction model from parasocial consensus sampling: a subjective evaluation

IVA'10 Proceedings of the 10th international conference on Intelligent virtual agents
Dimensional emotion prediction from spontaneous head gestures for interaction with sensitive artificial listeners

IVA'10 Proceedings of the 10th international conference on Intelligent virtual agents
Robust classification of face and head gestures in video

Image and Vision Computing
Recognition of hearing needs from body and eye movements to improve hearing instruments

Pervasive'11 Proceedings of the 9th international conference on Pervasive computing
Effect of time delays on agents' interaction dynamics

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Virtual rapport 2.0

IVA'11 Proceedings of the 10th international conference on Intelligent virtual agents
It's in their eyes: a study on female and male virtual humans' gaze

IVA'11 Proceedings of the 10th international conference on Intelligent virtual agents
Virtual rapport

IVA'06 Proceedings of the 6th international conference on Intelligent Virtual Agents
Multimodal sensing, interpretation and copying of movements by a virtual agent

PIT'06 Proceedings of the 2006 international tutorial and research conference on Perception and Interactive Technologies
Proxemic feature recognition for interactive robots: automating metrics from the social sciences

ICSR'11 Proceedings of the Third international conference on Social Robotics
Annotating non-verbal behaviours in informal interactions

COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
Using self-context for multimodal detection of head nods in face-to-face interactions

Proceedings of the 14th ACM international conference on Multimodal interaction
Understanding the nonverbal behavior of socially anxious people during intimate self-disclosure

IVA'12 Proceedings of the 12th international conference on Intelligent Virtual Agents
Towards the automatic detection of spontaneous agreement and disagreement based on nonverbal behaviour: A survey of related cues, databases, and tools

Image and Vision Computing
HRI-2013 workshop on probabilistic approaches for robot control in human-robot interaction (PARC-HRI)

Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction
Leveraging the robot dialog state for visual focus of attention recognition

Proceedings of the 15th ACM on International conference on multimodal interaction
Context aware addressee estimation for human robot interaction

Proceedings of the 6th workshop on Eye gaze in intelligent human machine interaction: gaze in multimodal interaction
Guest Editorial: Gesture and speech in interaction: An overview

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Head pose and gesture offer several key conversational grounding cues and are used extensively in face-to-face interaction among people. We investigate how dialog context from an embodied conversational agent (ECA) can improve visual recognition of user gestures. We present a recognition framework which (1) extracts contextual features from an ECA's dialog manager, (2) computes a prediction of head nod and head shakes, and (3) integrates the contextual predictions with the visual observation of a vision-based head gesture recognizer. We found a subset of lexical, punctuation and timing features that are easily available in most ECA architectures and can be used to learn how to predict user feedback. Using a discriminative approach to contextual prediction and multi-modal integration, we were able to improve the performance of head gesture detection even when the topic of the test set was significantly different than the training set.