WinkTalk: a demonstration of a multimodal speech synthesis platform linking facial expressions to expressive synthetic voices

Authors:
Éva Székely;Zeeshan Ahmed;João P. Cabral;Julie Carson-Berndsen
Affiliations:
University College Dublin, Dublin, Ireland;University College Dublin, Dublin, Ireland;University College Dublin, Dublin, Ireland;University College Dublin, Dublin, Ireland
Venue:
SLPAT '12 Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies
Year:
2012

Citing 3
Cited 1

Visual Prosody: Facial Movements Accompanying Speech

FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition
Multimodal Processing of Discourse Information; The Effect of Synchrony

ISUC '08 Proceedings of the 2008 Second International Symposium on Universal Communication
Face detection and tracking in video sequences using the modifiedcensus transformation

Image and Vision Computing

Predicting synthetic voice style from facial expressions. An application for augmented conversations

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a demonstration of the WinkTalk system, which is a speech synthesis platform using expressive synthetic voices. With the help of a webcamera and facial expression analysis, the system allows the user to control the expressive features of the synthetic speech for a particular utterance with their facial expressions. Based on a personalised mapping between three expressive synthetic voices and the users facial expressions, the system selects a voice that matches their face at the moment of sending a message. The WinkTalk system is an early research prototype that aims to demonstrate that facial expressions can be used as a more intuitive control over expressive speech synthesis than manual selection of voice types, thereby contributing to an improved communication experience for users of speech generating devices.