Head gestures for perceptual interfaces: The role of context in improving recognition

Authors:
Louis-Philippe Morency;Candace Sidner;Christopher Lee;Trevor Darrell
Affiliations:
MIT CSAIL, Cambridge, MA 02139, USA;BAE Systems AIT, Burlington, MA 01803, USA;Boston Dynamics, Waltham, MA 02139, USA;MIT CSAIL, Cambridge, MA 02139, USA
Venue:
Artificial Intelligence
Year:
2007

Citing 22
Cited 14

Eye tracking in advanced interface design

Virtual environments and advanced interface design
Support-Vector Networks

Machine Learning
Manual and gaze input cascaded (MAGIC) pointing

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Towards the design of multimodal interfaces for handheld conversational characters

CHI '02 Extended Abstracts on Human Factors in Computing Systems
Embodied agents for multi-party dialogue in immersive virtual worlds

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2
Collagen: applying collaborative discourse theory to human-computer interaction

AI Magazine
Real-Time Detection of Nodding and Head-Shaking by Directly Detecting and Tracking the "Between-Eyes"

FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
Tracking Focus of Attention in Meetings

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
A Basic Hand Gesture Control System for PC Applications

AIPR '01 Proceedings of the 30th on Applied Imagery Pattern Recognition Workshop
Fast Stereo-Based Head Tracking for Interactive Environments

FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition
Head Gestures for Computer Control

RATFG-RTS '01 Proceedings of the IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (RATFG-RTS'01)
Context-based vision system for place and object recognition

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality

Proceedings of the 5th international conference on Multimodal interfaces
A multi-modal approach for determining speaker location and focus

Proceedings of the 5th international conference on Multimodal interfaces
A real-time head nod and shake detector

Proceedings of the 2001 workshop on Perceptive user interfaces
Impact of video editing based on participants' gaze in multiparty conversation

CHI '04 Extended Abstracts on Human Factors in Computing Systems
Teaching and Working with Robots as a Collaboration

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Towards a model of face-to-face grounding

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Explorations in engagement for humans and robots

Artificial Intelligence
The effect of head-nod recognition in human-robot conversation

Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction
Behavior planning for a reflexive agent

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Adaptive view-based appearance models

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition

Context-based recognition during human interactions: automatic feature selection and encoding dictionary

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Head Pose Estimation from Passive Stereo Images

SCIA '09 Proceedings of the 16th Scandinavian Conference on Image Analysis
Real-time prosody-driven synthesis of body language

ACM SIGGRAPH Asia 2009 papers
Estimating user's engagement from eye-gaze behaviors in human-agent conversations

Proceedings of the 15th international conference on Intelligent user interfaces
Head motions during dialogue speech and nod timing control in humanoid robots

Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction
Classification of feedback expressions in multimodal data

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Theories, methods and current research on emotions in library and information science, information retrieval and human-computer interaction

Information Processing and Management: an International Journal
Head-pose recognition for a game system based on nose's relative position

HCII'11 Proceedings of the 14th international conference on Human-computer interaction: users and applications - Volume Part IV
Estimating a user's conversational engagement based on head pose information

IVA'11 Proceedings of the 10th international conference on Intelligent virtual agents
Computational study of human communication dynamic

J-HGBU '11 Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding
Generation of nodding, head tilting and eye gazing for human-robot dialogue interaction

HRI '12 Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction
Involuntary postural responses of users as input to Attentive Computing Systems: An investigation on head movements

Computers in Human Behavior
Gaze awareness in conversational agents: Estimating a user's conversational engagement from eye gaze

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special issue on interaction with smart objects, Special section on eye gaze and conversation
Analysis of relationship between head motion events and speech in dialogue conversations

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Head pose and gesture offer several conversational grounding cues and are used extensively in face-to-face interaction among people. To accurately recognize visual feedback, humans often use contextual knowledge from previous and current events to anticipate when feedback is most likely to occur. In this paper we describe how contextual information can be used to predict visual feedback and improve recognition of head gestures in human-computer interfaces. Lexical, prosodic, timing, and gesture features can be used to predict a user's visual feedback during conversational dialog with a robotic or virtual agent. In non-conversational interfaces, context features based on user-interface system events can improve detection of head gestures for dialog box confirmation or document browsing. Our user study with prototype gesture-based components indicate quantitative and qualitative benefits of gesture-based confirmation over conventional alternatives. Using a discriminative approach to contextual prediction and multi-modal integration, performance of head gesture detection was improved with context features even when the topic of the test set was significantly different than the training set.