Gesture synthesis adapted to speech emphasis

Authors:
Adso Fernández-Baena;Raúl Montaño;Marc Antonijoan;Arturo Roversi;David Miralles;Francesc Alías
Affiliations:
-;-;-;-;-;-
Venue:
Speech Communication
Year:
2014

Citing 21
Cited 0

Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents

SIGGRAPH '94 Proceedings of the 21st annual conference on Computer graphics and interactive techniques
Automatic ToBI prediction and alignment to speed manual labeling of prosody

Speech Communication - Special issue on speech annotation and corpus tools
Interactive motion generation from examples

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Interactive control of avatars animated with human motion data

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Movement Phase in Signs and Co-Speech Gestures, and Their Transcriptions by Human Coders

Proceedings of the International Gesture Workshop on Gesture and Sign Language in Human-Computer Interaction
Speaking with hands: creating animated conversational characters from recordings of human performance

ACM SIGGRAPH 2004 Papers
Multimodal expressive embodied conversational agents

Proceedings of the 13th annual ACM international conference on Multimedia
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
Synthesis and evaluation of linear motion transitions

ACM Transactions on Graphics (TOG)
Gesture modeling and animation based on a probabilistic re-creation of speaker style

ACM Transactions on Graphics (TOG)
Towards Natural Gesture Synthesis: Evaluating Gesture Units in a Data-Driven Approach to Gesture Synthesis

IVA '07 Proceedings of the 7th international conference on Intelligent Virtual Agents
Evaluating distance metrics for animation blending

Proceedings of the 4th International Conference on Foundations of Digital Games
Augmenting Gesture Animation with Motion Capture Data to Provide Full-Body Engagement

IVA '09 Proceedings of the 9th International Conference on Intelligent Virtual Agents
Real-time prosody-driven synthesis of body language

ACM SIGGRAPH Asia 2009 papers
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Gesture controllers

ACM SIGGRAPH 2010 papers
A HMM-based Fuzzy Computing Model for Emotional Speech Recognition

PCSPA '10 Proceedings of the 2010 First International Conference on Pervasive Computing, Signal Processing and Applications
How to train your avatar: a data driven approach to gesture generation

IVA'11 Proceedings of the 10th international conference on Intelligent virtual agents
Generating avoidance motion using motion graph

MIG'11 Proceedings of the 4th international conference on Motion in Games
Motion synthesis for synchronizing with streaming music by segment-based search on metadata motion graphs

ICME '11 Proceedings of the 2011 IEEE International Conference on Multimedia and Expo
Prominence Detection Using Auditory Attention Cues and Task-Dependent High Level Information

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Avatars communicate through speech and gestures to appear realistic and to enhance interaction with humans. In this context, several works have analyzed the relationship between speech and gestures, while others have been focused on their synthesis, following different approaches. In this work, we address both goals by linking speech to gestures in terms of time and intensity, to then use this knowledge to drive a gesture synthesizer from a manually annotated speech signal. To that effect, we define strength indicators for speech and motion. After validating them through perceptual tests, we obtain an intensity rule from their correlation. Moreover, we derive a synchrony rule to determine temporal correspondences between speech and gestures. These analyses have been conducted on aggressive and neutral performances to cover a broad range of emphatic levels, whose speech signal and motion have been manually annotated. Next, intensity and synchrony rules are used to drive a gesture synthesizer called gesture motion graph (GMG). These rules are validated by users from GMG output animations through perceptual tests. Results show that animations using intensity and synchrony rules perform better than those only using the synchrony rule (which in turn enhance realism with respect to random animation). Finally, we conclude that the extracted rules allow GMG to properly synthesize gestures adapted to speech emphasis from annotated speech.