Video Rewrite: driving visual speech with audio
Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Digital Image Warping
Trainable videorealistic speech animation
Proceedings of the 29th annual conference on Computer graphics and interactive techniques
ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Visual Prosody: Facial Movements Accompanying Speech
FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition
Visual Speech Synthesis by Morphing Visemes
Visual Speech Synthesis by Morphing Visemes
Unit selection in a concatenative speech synthesis system using a large speech database
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Photo-realistic talking-heads from image samples
IEEE Transactions on Multimedia
Photorealistic 2D audiovisual text-to-speech synthesis using active appearance models
Proceedings of the SSPNET 2nd International Symposium on Facial Analysis and Animation
International Journal of Human-Computer Studies
Hi-index | 0.00 |
Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the wellknown unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality.