IEEE Transactions on Audio, Speech, and Language Processing
ACM SIGGRAPH 2010 papers
On the importance of eye gaze in a face-to-face collaborative task
Proceedings of the 3rd international workshop on Affective interaction in natural environments
Speech, gaze and head motion in a face-to-face collaborative task
Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
How to train your avatar: a data driven approach to gesture generation
IVA'11 Proceedings of the 10th international conference on Intelligent virtual agents
Non-rigid 3D shape tracking from multiview video
Computer Vision and Image Understanding
Guest Editorial: Gesture and speech in interaction: An overview
Speech Communication
Hi-index | 0.14 |
We propose a new two-stage framework for joint analysis of head gesture and speech prosody patterns of a speaker towards automatic realistic synthesis of head gestures from speech prosody. In the first stage analysis, we perform Hidden Markov Model (HMM) based unsupervised temporal segmentation of head gesture and speech prosody features separately to determine elementary head gesture and speech prosody patterns, respectively, for a particular speaker. In the second stage, joint analysis of correlations between these elementary head gesture and prosody patterns is performed using Multi-Stream HMMs to determine an audio-visual mapping model. The resulting audio-visual mapping model is then employed to synthesize natural head gestures from arbitrary input test speech given a head model for the speaker. In the synthesis stage, the audio-visual mapping model is used to predict a sequence of gesture patterns from the prosody pattern sequence computed for the input test speech. The Euler angles associated with each gesture pattern are then applied to animate the speaker head model. Objective and subjective evaluations indicate that the proposed synthesis by analysis scheme provides natural looking head gestures for the speaker with any input test speech, as well as in ``prosody transplant" and ``gesture transplant" scenarios.