Fundamentals of speech recognition
Fundamentals of speech recognition
Quantitative association of vocal-tract and facial behavior
Speech Communication - Special issue on auditory-visual speech processing
A Speech Driven Talking Head System Based on a Single Face Image
PG '99 Proceedings of the 7th Pacific Conference on Computer Graphics and Applications
Lip-Sync in Human Face Animation Based on Video Analysis and Spline Models
MMM '04 Proceedings of the 10th International Multimedia Modelling Conference
AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Humanoid Audio–Visual Avatar With Emotive Text-to-Speech Synthesis
IEEE Transactions on Multimedia
Hi-index | 0.00 |
In this paper we present an algorithm for automatically cloning a speaking human face for virtual reality applications and virtual worlds buildings. A person trains the algorithm by speaking; the algorithm learns the articulatory movements and generates synthetic speech which sounds somehow similar to the speech of the person who trained the system. Modeling the nonlinear linkage between articulatory movements and facial movements with a neural network, the algorithm generates the facial movements, synchronized with the artificial utterance, which would have been used during the generation of the utterance by a human being. Our algorithm is inspired to the mirror neuron theory of speech production, and learns the articulatory movements using a genetic optimization algorithm and a set of fuzzy rules. The algorithm reproduces an original utterance minimizing the mean squared error between synthetic and original utterances. Subjective listening tests of sentences artificially generated with our model resulted in an average phonetic accuracy of about 84%, and the naturalness of the generated face movements has been estimated with a score of 82%. Experimental results and a case study are reported.