Speech-driven cartoon animation with emotions

Authors:
Yan Li;Feng Yu;Ying-Qing Xu;Eric Chang;Heung-Yeung Shum
Affiliations:
Microsoft Research China, Beijing, China;Tsinghua University, China;Microsoft Research China, Beijing, China;Microsoft Research China, Beijing, China;Microsoft Research China, Beijing, China
Venue:
MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Year:
2001

Citing 11
Cited 8

Feature-based image metamorphosis

SIGGRAPH '92 Proceedings of the 19th annual conference on Computer graphics and interactive techniques
Fundamentals of speech recognition

Fundamentals of speech recognition
Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents

SIGGRAPH '94 Proceedings of the 21st annual conference on Computer graphics and interactive techniques
Computer graphics (2nd ed. in C): principles and practice

Computer graphics (2nd ed. in C): principles and practice
Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Synthesizing realistic facial expressions from photographs

Proceedings of the 25th annual conference on Computer graphics and interactive techniques
Voice puppetry

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Emotion recognition and its application to computer agents with spontaneous interactive capabilities

C&C '99 Proceedings of the 3rd conference on Creativity & cognition
Animated CharToon faces

NPAR '00 Proceedings of the 1st international symposium on Non-photorealistic animation and rendering
Performance-driven hand-drawn animation

NPAR '00 Proceedings of the 1st international symposium on Non-photorealistic animation and rendering
Eigen-points: Control-point Location using Principle Component Analyses

FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)

Emotion Detection from Speech to Enrich Multimedia Content

PCM '01 Proceedings of the Second IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Empathic painting: interactive stylization through observed emotional state

Proceedings of the 4th international symposium on Non-photorealistic animation and rendering
Real-time language independent lip synchronization method using a genetic algorithm

Signal Processing - Special section: Multimodal human-computer interfaces
Expressive Face Animation Synthesis Based on Dynamic Mapping Method

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Retrieval based interactive cartoon synthesis via unsupervised bi-distance metric learning

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Realistic visual speech synthesis based on hybrid concatenation method

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Extracting emotion from speech: towards emotional speech-driven facial animations

SG'03 Proceedings of the 3rd international conference on Smart graphics
Cartoon synthesis using constrained spreading activation network

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a cartoon face animation system for multimedia HCI applications. We animate face cartoons not only from input speech, but also based on emotions derived from speech signal. Using a corpus of over 700 utterances from different speakers, we have trained SVMs (support vector machines) to recognize four categories of emotions: neutral, happiness, anger and sadness. Given each input speech phrase, we identify its emotion content as a mixture of all four emotions, rather than classifying it into a single emotion. Then, facial expressions are= generated from the recovered emotion for each phrase, by morphing different cartoon templates that correspond to various emotions. To ensure smooth transitions in the animation, we apply low-pass filtering to the recovered (and possibly jumpy) emotion sequence. Moreover, lip-syncing is applied to produce the lip movement from speech, by recovering a statistical audio-visual mapping. Experimental results demonstrate that cartoon animation sequences generated by our system are of good and convincing quality.