Lip-reading: furhat audio visual intelligibility of a back projected animated face

Authors:
Samer Al Moubayed;Gabriel Skantze;Jonas Beskow
Affiliations:
Department of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden;Department of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden;Department of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
Venue:
IVA'12 Proceedings of the 12th international conference on Intelligent Virtual Agents
Year:
2012

Citing 13
Cited 0

Visual Speech Synthesis by Morphing Visemes

International Journal of Computer Vision - special issue on learning and vision at the center for biological and computational learning, Massachusetts Institute of Technology
Shader Lamps: Animating Real Objects With Image-Based Illumination

Proceedings of the 12th Eurographics Workshop on Rendering Techniques
Providing computer game characters with conversational abilities

Lecture Notes in Computer Science
A conversational agent as museum guide: design and evaluation of a real-world application

Lecture Notes in Computer Science
From brows to trust: evaluating embodied conversational agents

From brows to trust: evaluating embodied conversational agents
Expression of Emotions Using Wrinkles, Blushing, Sweating and Tears

IVA '09 Proceedings of the 9th International Conference on Intelligent Virtual Agents
SynFace: speech-driven facial animation for virtual speech-reading support

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Robots meet IVAs: a mind-body interface for migrating artificial intelligent agents

IVA'11 Proceedings of the 10th international conference on Intelligent virtual agents
The Mona Lisa gaze effect as an objective metric for perceived cospatiality

IVA'11 Proceedings of the 10th international conference on Intelligent virtual agents
Taming Mona Lisa: Communicating gaze faithfully in 2D and 3D facial projections

ACM Transactions on Interactive Intelligent Systems (TiiS)
Virtual rapport

IVA'06 Proceedings of the 6th international conference on Intelligent Virtual Agents
Face-to-face interaction and the KTH cooking show

COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony
Furhat: a back-projected human-like robot head for multiparty human-machine interaction

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Back projecting a computer animated face, onto a three dimensional static physical model of a face, is a promising technology that is gaining ground as a solution to building situated, flexible and human-like robot heads. In this paper, we first briefly describe Furhat, a back projected robot head built for the purpose of multimodal multiparty human-machine interaction, and its benefits over virtual characters and robotic heads; and then motivate the need to investigating the contribution to speech intelligibility Furhat's face offers. We present an audio-visual speech intelligibility experiment, in which 10 subjects listened to short sentences with degraded speech signal. The experiment compares the gain in intelligibility between lip reading a face visualized on a 2D screen compared to a 3D back-projected face and from different viewing angles. The results show that the audio-visual speech intelligibility holds when the avatar is projected onto a static face model (in the case of Furhat), and even, rather surprisingly, exceeds it. This means that despite the movement limitations back projected animated face models bring about; their audio visual speech intelligibility is equal, or even higher, compared to the same models shown on flat displays. At the end of the paper we discuss several hypotheses on how to interpret the results, and motivate future investigations to better explore the characteristics of visual speech perception 3D projected faces.