Spontaneous spoken dialogues with the furhat human-like robot head

Authors:
Samer Al Moubayed;Jonas Beskow;Gabriel Skantze
Affiliations:
KTH, Stockholm, Sweden;KTH, Stockholm, Sweden;KTH, Stockholm, Sweden
Venue:
Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction
Year:
2014

Citing 3
Cited 0

Taming Mona Lisa: Communicating gaze faithfully in 2D and 3D facial projections

ACM Transactions on Interactive Intelligent Systems (TiiS)
IrisTK: a statechart-based toolkit for multi-party face-to-face interaction

Proceedings of the 14th ACM international conference on Multimodal interaction
Furhat: a back-projected human-like robot head for multiparty human-machine interaction

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Furhat [1] is a robot head that deploys a back-projected animated face that is realistic and human-like in anatomy. Furhat relies on a state-of-the-art facial animation architecture allowing accurate synchronized lip movements with speech, and the control and generation of non-verbal gestures, eye movements and facial expressions. Furhat is built to study, implement and validate patterns and models of human-human and human-machine situated and multi-party multimodal communication, a study that demands the co-presence of the talking head in the interaction environment, some-thing that cannot be achieved using virtual avatars displayed on flat screens [2,3]. In Furhat, the animated face is back-projected on a translucent mask that is a printout of the animated model. The mask is then rigged on a 2DOF neck to allow for the control of head movements. Figure 1 shows a snapshot of Furhat in interaction. We will show in this demonstrator an advanced multimodal and multiparty spoken conversational system using Furhat, a robot head based on projected facial animation. Furhat is an anthropomorphic robot head that utilizes facial animation for physical robot heads using back-projection. In the system, multimodality is enabled using speech and rich visual input signals such as multi-person real-time face tracking and microphone tracking. The demonstrator will showcase a system that is able to carry out social dialogue with multiple interlocutors simultaneously with rich output signals such as eye and head coordination, lips synchronized speech synthesis, and non-verbal facial gestures used to regulate fluent and expressive multiparty conversations. The dialogue design is performed using the IrisTK [4] dialogue authoring toolkit developed at KTH. The system will also be able to perform a moderator in a quiz-game showing different strategies for regulating spoken situated interactions.