Audiovisual alignment in a face-to-face conversation translation framework

Authors:
Jerneja Žganec Gros;Aleš Mihelič
Affiliations:
Alpineon Research and Development, Ljubljana, Slovenia;Alpineon Research and Development, Ljubljana, Slovenia
Venue:
BioID_MultiComm'09 Proceedings of the 2009 joint COST 2101 and 2102 international conference on Biometric ID management and multimodal communication
Year:
2009

Citing 9
Cited 0

Trainable videorealistic speech animation

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
MPEG-4 Facial Animation: The Standard,Implementation and Applications

MPEG-4 Facial Animation: The Standard,Implementation and Applications
Annotation schemes for verbal and non-verbal communication: some general issues

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
Presenting in style by virtual humans

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
On the use of nonverbal speech sounds in human communication

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
VideoTRAN: a translation framework for audiovisual face-to-face conversations

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
Analysis and synthesis of multimodal verbal and non-verbal interaction for animated interface agents

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
User evaluation of the SYNFACE talking head telephone

ICCHP'06 Proceedings of the 10th international conference on Computers Helping People with Special Needs
The voiceTRAN speech-to-speech communicator

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent improvements in audiovisual alignment for a translating videophone are presented. A method for audiovisual alignment in the target language is proposed and the process of audiovisual speech synthesis is described. The proposed method has been evaluated in the VideoTRAN translating videophone environment, where an H.323 software client translating videophone allows for the transmission and translation of a set of multimodal verbal and nonverbal clues in a multilingual face-to-face communication setting. An extension of subjective evaluation metrics of fluency and adequacy, which are commonly used in subjective machine translation evaluation tests, is proposed for usage in an audiovisual translation environment.