Fusion of children's speech and 2D gestures when conversing with 3D characters

Authors:
Jean-Claude Martin;Stéphanie Buisine;Guillaume Pitel;Niels Ole Bernsen
Affiliations:
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI-CNRS), Orsay Cedex, France;Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI-CNRS), Orsay Cedex, France;Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI-CNRS), Orsay Cedex, France;Natural Interactive Systems Lab, Odense M, Denmark
Venue:
Signal Processing - Special section: Multimodal human-computer interfaces
Year:
2006

Citing 24
Cited 3

Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Designing and evaluating conversational interfaces with animated characters

Embodied conversational agents
Embodied agents for multi-party dialogue in immersive virtual worlds

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2
Oops! silly me! errors in a handwriting recognition-based text entry interface for children

Proceedings of the second Nordic conference on Human-computer interaction
Building a Multimodal Human-Robot Interface

IEEE Intelligent Systems
Multimodal interfaces

The human-computer interaction handbook
User-centered design in games

The human-computer interaction handbook
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality

Proceedings of the 5th international conference on Multimodal interfaces
Toward a theory of organized multimodal integration patterns during human-computer interaction

Proceedings of the 5th international conference on Multimodal interfaces
Unification-based multimodal integration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Unification-based multimodal parsing

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
What's my method?: a game show on games

CHI '04 Extended Abstracts on Human Factors in Computing Systems
The untapped world of video games

CHI '04 Extended Abstracts on Human Factors in Computing Systems
First prototype of conversational H.C. Andersen

Proceedings of the working conference on Advanced visual interfaces
Toward adaptive conversational interfaces: Modeling speech convergence with animated personas

ACM Transactions on Computer-Human Interaction (TOCHI)
Evaluation of spoken multimodal conversation

Proceedings of the 6th international conference on Multimodal interfaces
Elvis: situated speech and gesture understanding for a robotic chandelier

Proceedings of the 6th international conference on Multimodal interfaces
Modality fusion for graphic design applications

Proceedings of the 6th international conference on Multimodal interfaces
Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures

Proceedings of the 6th international conference on Multimodal interfaces
Children's and adults' multimodal interaction with 2D conversational agents

CHI '05 Extended Abstracts on Human Factors in Computing Systems
From brows to trust: evaluating embodied conversational agents

From brows to trust: evaluating embodied conversational agents
The blind men and the elephant revisited

From brows to trust
Multimodal interactive maps: designing for human performance

Human-Computer Interaction

Follow-up question handling in the imix and ritel systems: A comparative study

Natural Language Engineering
Reliable Evaluation of Multimodal Dialogue Systems

Proceedings of the 13th International Conference on Human-Computer Interaction. Part II: Novel Interaction Methods and Techniques
Speech and 2d deictic gesture reference to virtual scenes

PIT'06 Proceedings of the 2006 international tutorial and research conference on Perception and Interactive Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most existing multi-modal prototypes enabling users to combine 2D gestures and speech input are task-oriented. They help adult users solve particular information tasks often in 2D standard Graphical User Interfaces. This paper describes the NICE Andersen system, which aims at demonstrating multi-modal conversation between humans and embodied historical and literary characters. The target users are 10-18 years old children and teenagers. We discuss issues in 2D gesture recognition and interpretation as well as temporal and semantic dimensions of input fusion, ranging from systems and component design through technical evaluation and user evaluation with two different user groups. We observed that recognition and understanding of spoken deictics were quite robust and that spoken deictics were always used in multimodal input. We identified the causes of the most frequent failures of input fusion and suggest possible improvements for removing these errors. The concluding discussion summarises the knowledge provided by the NICE Andersen system on how children gesture and combine their 2D gestures with speech when conversing with a 3D character, and looks at some of the challenges facing theoretical solutions aimed at supporting unconstrained speech/2D gesture fusion.