Inter-speaker synchronization in audiovisual database for lip-readable speech to animation conversion

Authors:
Gergely Feldhoffer;Balázs Oroszi;György Takács;Attila Tihanyi;Tamás Bárdi
Affiliations:
Faculty of Information Technology, Péter Pázmány Catholic University, Budapest, Hungary;Faculty of Information Technology, Péter Pázmány Catholic University, Budapest, Hungary;Faculty of Information Technology, Péter Pázmány Catholic University, Budapest, Hungary;Faculty of Information Technology, Péter Pázmány Catholic University, Budapest, Hungary;Faculty of Information Technology, Péter Pázmány Catholic University, Budapest, Hungary
Venue:
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Year:
2007

Citing 1
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

The present study proposes an inter-speaker audiovisual synchronization method to decrease the speaker dependency of our direct speech to animation conversion system. Our aim is to convert an everyday speaker's voice to lip-readable facial animation for hearing impaired users. This conversion needs mixed training data: acoustic features from normal speakers coupled with visual features from professional lip-speakers. Audio and video data of normal and professional speakers were synchronized with Dynamic Time Warping method. Quality and usefulness of the synchronization were investigated in subjective test with measuring noticeable conflicts between the audio and visual part of speech stimuli. An objective test was done also, training neural network on the synchronized audiovisual data with increasing number of speakers.