Intelligent content production for a virtual speaker

Authors:
Karlo Smid;Igor S. Pandzic;Viktorija Radman
Affiliations:
Ericsson Nikola Tesla, Zagreb;Faculty of electrical engineering and computing, Zagreb University, Zagreb;Ericsson Nikola Tesla, Zagreb
Venue:
IMTCI'04 Proceedings of the Second international conference on Intelligent Media Technology for Communicative Intelligence
Year:
2004

Citing 9
Cited 0

Automated lip-synch and speech synthesis for character animation

CHI '87 Proceedings of the SIGCHI/GI Conference on Human Factors in Computing Systems and Graphics Interface
Émile: Marshalling passions in training and education

AGENTS '00 Proceedings of the fourth international conference on Autonomous agents
BEAT: the Behavior Expression Animation Toolkit

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Facial animation framework for the web and mobile platforms

Proceedings of the seventh international conference on 3D Web technology
Eyes alive

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
MPEG-4 Facial Animation: The Standard,Implementation and Applications

MPEG-4 Facial Animation: The Standard,Implementation and Applications
Conversational Virtual Character for the Web

CA '02 Proceedings of the Computer Animation
Issues with Lip Sync Animation: Can You Read My Lips?

CA '02 Proceedings of the Computer Animation
Visual Prosody: Facial Movements Accompanying Speech

FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a graphically embodied animated agent (a virtual speaker) capable of reading a plain English text and rendering it in a form of speech accompanied by the appropriate facial gestures. Our system uses a lexical analysis of an English text and statistical models of facial gestures in order to automatically generate the gestures related to the spoken text. It is intended for the automatic creation of the realistically animated virtual speakers, such as newscasters and storytellers and incorporates the characteristics of such speakers captured from the training video clips. Our system is based on a visual text-to-speech system which generates a lip movement synchronised with the generated speech. This is extended to include eye blinks, head and eyebrow motion, and a simple gaze following behaviour. The result is a full face animation produced automatically from the plain English text.