Visual Prosody: Facial Movements Accompanying Speech

Authors:
Volker Strom
Affiliations:
-
Venue:
FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition
Year:
2002

Citing 0
Cited 25

Audio-based head motion synthesis for Avatar-based telepresence systems

Proceedings of the 2004 ACM SIGMM workshop on Effective telepresence
Transferable videorealistic speech animation

Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation
Evaluating users' reactions to human-like interfaces

From brows to trust
Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces

IEEE Transactions on Visualization and Computer Graphics
Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Combining Empirical Studies of Audio-Lingual and Visual-Facial Modalities for Emotion Recognition

KES '07 Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference
On Improving Visual-Facial Emotion Recognition with Audio-lingual and Keyboard Stroke Pattern Information

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Towards Facial Gestures Generation by Speech Signal Analysis Using HUGE Architecture

Multimodal Signals: Cognitive and Algorithmic Issues
Designing a multi-modal affective knowledge-based user interface: combining empirical studies

Proceedings of the 2008 conference on Knowledge-Based Software Engineering: Proceedings of the Eighth Joint Conference on Knowledge-Based Software Engineering
Associating facial displays with syntactic constituents for generation

LAW '07 Proceedings of the Linguistic Annotation Workshop
On the importance of audiovisual coherence for the perceived quality of synthesized visual speech

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Modeling the expressivity of input text semantics for Chinese text-to-speech synthesis in a spoken dialog system

IEEE Transactions on Audio, Speech, and Language Processing
Head motions during dialogue speech and nod timing control in humanoid robots

Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction
On assisting a visual-facial affect recognition system with keyboard-stroke pattern information

Knowledge-Based Systems
Visual affect recognition

Visual affect recognition
On the importance of eye gaze in a face-to-face collaborative task

Proceedings of the 3rd international workshop on Affective interaction in natural environments
Mood avatar: automatic text-driven head motion synthesis

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Speech, gaze and head motion in a face-to-face collaborative task

Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
On creating multimodal virtual humans--real time speech driven facial gesturing

Multimedia Tools and Applications
[HUGE]: universal architecture for statistically based HUman GEsturing

IVA'06 Proceedings of the 6th international conference on Intelligent Virtual Agents
Inferring competitive role patterns in reality TV show through nonverbal analysis

Multimedia Tools and Applications
Intelligent content production for a virtual speaker

IMTCI'04 Proceedings of the Second international conference on Intelligent Media Technology for Communicative Intelligence
Generation of nodding, head tilting and eye gazing for human-robot dialogue interaction

HRI '12 Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction
WinkTalk: a demonstration of a multimodal speech synthesis platform linking facial expressions to expressive synthetic voices

SLPAT '12 Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies
Analysis of relationship between head motion events and speech in dialogue conversations

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

As we articulate speech, we usually move the head and exhibit various facial expressions. This visual aspect of speech aids understanding and helps communicating additional information, such as the speaker's mood. In this paper we analyze quantitatively head and facial movements that accompany speech and investigate how they relate to the text's prosodic structure.We recorded several hours of speech and measured the locations of the speakers' main facial features as well as their head poses. The text was evaluated with a prosody prediction tool, identifying phrase boundaries and pitch accents. Characteristic for most speakers are simple motion patterns that are repeatedly applied in synchrony with the main prosodic events. Direction and strength of head movements vary widely from one speaker to another, yet their timing is typically well synchronized with the spoken text.Understanding quantitatively the correlations between head movements and spoken text is important for synthesizing photo-realistic talking heads. Talking heads appear much more engaging when they exhibit realistic motion patterns.