From conversational tooltips to grounded discourse: head poseTracking in interactive dialog systems

Authors:
Louis-Philippe Morency;Trevor Darrell
Affiliations:
MIT, Cambridge, MA;MIT, Cambridge, MA
Venue:
Proceedings of the 6th international conference on Multimodal interfaces
Year:
2004

Citing 21
Cited 4

Face Recognition by Elastic Bunch Graph Matching

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pfinder: Real-Time Tracking of the Human Body

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient Region Tracking With Parametric Models of Geometry and Illumination

IEEE Transactions on Pattern Analysis and Machine Intelligence
Embodiment in conversational interfaces: Rea

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
A morphable model for the synthesis of 3D faces

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Registration of Texture-Mapped 3D Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Relational agents: a model and implementation of building user trust

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Active Appearance Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Collagen: applying collaborative discourse theory to human-computer interaction

AI Magazine
Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion

ICCV '95 Proceedings of the Fifth International Conference on Computer Vision
Motion Regularization for Model-Based Head Tracking

ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276 - Volume 7276
Tracking Focus of Attention in Meetings

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Fast Stereo-Based Head Tracking for Interactive Environments

FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition
Head Gestures for Computer Control

RATFG-RTS '01 Proceedings of the IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (RATFG-RTS'01)
A multi-modal approach for determining speaker location and focus

Proceedings of the 5th international conference on Multimodal interfaces
Where to look: a study of human-robot engagement

Proceedings of the 9th international conference on Intelligent user interfaces
A real-time head nod and shake detector

Proceedings of the 2001 workshop on Perceptive user interfaces
Nodding in conversations with a robot

CHI '04 Extended Abstracts on Human Factors in Computing Systems
Impact of video editing based on participants' gaze in multiparty conversation

CHI '04 Extended Abstracts on Human Factors in Computing Systems
Towards a model of face-to-face grounding

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Adaptive view-based appearance models

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition

The effect of head-nod recognition in human-robot conversation

Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction
Mapping the demographics of virtual humans

BCS-HCI '07 Proceedings of the 21st British HCI Group Annual Conference on People and Computers: HCI...but not as we know it - Volume 2
A realistic, virtual head for human-computer interaction

Interacting with Computers
Robust stereoscopic head pose estimation in human-computer interaction and a unified evaluation framework

ICIAP'11 Proceedings of the 16th international conference on Image analysis and processing: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Head pose and gesture offer several key conversational grounding cues and are used extensively in face-to-face interaction among people. While the machine interpretation of these cues has previously been limited to output modalities, recent advances in face-pose tracking allow for systems which are robust and accurate enough to sense natural grounding gestures. We present the design of a module that detects these cues and show examples of its integration in three different conversational agents with varying degrees of discourse model complexity. Using a scripted discourse model and off-the-shelf animation and speech-recognition components, we demonstrate the use of this module in a novel "conversational tooltip" task, where additional information is spontaneously provided by an animated character when users attendto various physical objects or characters in the environment. We further describe the integration of our module in two systems where animated and robotic characters interact with users based on rich discourse and semantic models.