Audio-based head motion synthesis for Avatar-based telepresence systems

Authors:
Zhigang Deng;Shri Narayanan;Carlos Busso;Ulrich Neumann
Affiliations:
University of Southern California, Los Angeles, CA;University of Southern California, Los Angeles, CA;University of Southern California, Los Angeles, CA;University of Southern California, Los Angeles, CA
Venue:
Proceedings of the 2004 ACM SIGMM workshop on Effective telepresence
Year:
2004

Citing 13
Cited 5

Optimization theory with applications

Optimization theory with applications
Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents

SIGGRAPH '94 Proceedings of the 21st annual conference on Computer graphics and interactive techniques
Improv: a system for scripting interactive actors in virtual worlds

SIGGRAPH '96 Proceedings of the 23rd annual conference on Computer graphics and interactive techniques
Comic Chat

SIGGRAPH '96 Proceedings of the 23rd annual conference on Computer graphics and interactive techniques
Computer facial animation

Computer facial animation
Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Voice puppetry

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
The EMOTE model for effort and shape

Proceedings of the 27th annual conference on Computer graphics and interactive techniques
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Expression cloning

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
BEAT: the Behavior Expression Animation Toolkit

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Trainable videorealistic speech animation

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Visual Prosody: Facial Movements Accompanying Speech

FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition

Towards Facial Gestures Generation by Speech Signal Analysis Using HUGE Architecture

Multimodal Signals: Cognitive and Algorithmic Issues
Decimation of human face model for real-time animation in intelligent multimedia systems

Multimedia Tools and Applications
High-realistic and flexible virtual presenters

AMDO'10 Proceedings of the 6th international conference on Articulated motion and deformable objects
On creating multimodal virtual humans--real time speech driven facial gesturing

Multimedia Tools and Applications
Quantitative evaluation of media space configuration in a task-oriented remote conference system

Proceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a data-driven audio-based head motion synthesis technique is presented for avatar-based telepresence systems. First, head motion of a human subject speaking a custom corpus is captured, and the accompanying audio features are extracted. Based on the aligned pairs between audio features and head motion (audio-headmotion), a K-Nearest Neighbors (KNN) based dynamic programming algorithm is used to synthesize novel head motion given new audio input. This approach also provides optional intuitive keyframe (key head poses) control: after key head poses are specified, this method will synthesize appropriate head motion sequences that maximally meet the requirements of both the speech and key head poses.