Lip-reading: furhat audio visual intelligibility of a back projected animated face

  • Authors:
  • Samer Al Moubayed;Gabriel Skantze;Jonas Beskow

  • Affiliations:
  • Department of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden;Department of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden;Department of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden

  • Venue:
  • IVA'12 Proceedings of the 12th international conference on Intelligent Virtual Agents
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Back projecting a computer animated face, onto a three dimensional static physical model of a face, is a promising technology that is gaining ground as a solution to building situated, flexible and human-like robot heads. In this paper, we first briefly describe Furhat, a back projected robot head built for the purpose of multimodal multiparty human-machine interaction, and its benefits over virtual characters and robotic heads; and then motivate the need to investigating the contribution to speech intelligibility Furhat's face offers. We present an audio-visual speech intelligibility experiment, in which 10 subjects listened to short sentences with degraded speech signal. The experiment compares the gain in intelligibility between lip reading a face visualized on a 2D screen compared to a 3D back-projected face and from different viewing angles. The results show that the audio-visual speech intelligibility holds when the avatar is projected onto a static face model (in the case of Furhat), and even, rather surprisingly, exceeds it. This means that despite the movement limitations back projected animated face models bring about; their audio visual speech intelligibility is equal, or even higher, compared to the same models shown on flat displays. At the end of the paper we discuss several hypotheses on how to interpret the results, and motivate future investigations to better explore the characteristics of visual speech perception 3D projected faces.