Fundamentals of speech recognition
Fundamentals of speech recognition
Quantitative association of vocal-tract and facial behavior
Speech Communication - Special issue on auditory-visual speech processing
Lip movement synthesis from speech based on hidden Markov models
Speech Communication - Special issue on auditory-visual speech processing
Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Codebook based face point trajectory synthesis algorithm using speech input
Speech Communication
Sample-Based Synthesis of Photo-Realistic Talking Heads
CA '98 Proceedings of the Computer Animation
Animation of Synthetic Faces in MPEG-4
CA '98 Proceedings of the Computer Animation
IEEE Transactions on Circuits and Systems for Video Technology
Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction
Cultural Specific Effects on the Recognition of Basic Emotions: A Study on Italian Subjects
USAB '09 Proceedings of the 5th Symposium of the Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society on HCI and Usability for e-Inclusion
Lip synchronization from Thai speech
Proceedings of the 10th International Conference on Virtual Reality Continuum and Its Applications in Industry
On speech and gestures synchrony
COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Detecting Facial Expressions for Monitoring Patterns of Emotional Behavior
International Journal of Monitoring and Surveillance Technologies Research
Hi-index | 0.00 |
The results reported in this article are an integral part of a larger project aimed at achieving perceptually realistic animations, including the individualized nuances, of three-dimensional human faces driven by speech. The audiovisual system that has been developed for learning the spatio-temporal relationship between speech acoustics and facial animation is described, including video and speech processing, pattern analysis, and MPEG-4 compliant facial animation for a given speaker. In particular, we propose a perceptual transformation of the speech spectral envelope, which is shown to capture the dynamics of articulatory movements. An efficient nearest-neighbor algorithm is used to predict novel articulatory trajectories from the speech dynamics. The results are very promising and suggest a new way to approach the modeling of synthetic lip motion of a given speaker driven by his/her speech. This would also provide clues toward a more general cross-speaker realistic animation.