Audio-visual prosody: perception, detection, and synthesis of prominence

Authors:
Samer Al Moubayed;Jonas Beskow;Björn Granström;David House
Affiliations:
Royal Institute of Technology KTH, Stockholm, Sweden;Royal Institute of Technology KTH, Stockholm, Sweden;Royal Institute of Technology KTH, Stockholm, Sweden;Royal Institute of Technology KTH, Stockholm, Sweden
Venue:
Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
Year:
2010

Citing 5
Cited 2

Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents

SIGGRAPH '94 Proceedings of the 21st annual conference on Computer graphics and interactive techniques
Eye communication in a conversational 3D synthetic agent

AI Communications
SynFace: speech-driven facial animation for virtual speech-reading support

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Prosody off the top of the head: Prosodic contrasts can be discriminated by head motion

Speech Communication
An Acoustic Measure for Word Prominence in Spontaneous Speech

IEEE Transactions on Audio, Speech, and Language Processing

Predicting synthetic voice style from facial expressions. An application for augmented conversations

Speech Communication
A multimodal approach to markedness in spoken French

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this chapter, we investigate the effects of facial prominence cues, in terms of gestures, when synthesized on animated talking heads. In the first study a speech intelligibility experiment is conducted, where speech quality is acoustically degraded, then the speech is presented to 12 subjects through a lip synchronized talking head carrying head-nods and eyebrow raising gestures. The experiment shows that perceiving visual prominence as gestures, synchronized with the auditory prominence, significantly increases speech intelligibility compared to when these gestures are randomly added to speech. We also present a study examining the perception of the behavior of the talking heads when gestures are added at pitch movements. Using eye-gaze tracking technology and questionnaires for 10 moderately hearing impaired subjects, the results of the gaze data show that users look at the face in a similar fashion to when they look at a natural face when gestures are coupled with pitch movements opposed to when the face carries no gestures. From the questionnaires, the results also show that these gestures significantly increase the naturalness and helpfulness of the talking head.