Least-Squares Fitting of Two 3-D Point Sets
IEEE Transactions on Pattern Analysis and Machine Intelligence
Video Rewrite: driving visual speech with audio
Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Lip movement synthesis from speech based on hidden Markov models
Speech Communication - Special issue on auditory-visual speech processing
Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Speech-driven cartoon animation with emotions
MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
MikeTalk: A Talking Facial Display Based on Morphing Visemes
CA '98 Proceedings of the Computer Animation
Coupled hidden Markov models for complex action recognition
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Emotional Chinese talking head system
Proceedings of the 6th international conference on Multimodal interfaces
Audio-Visual Affect Recognition through Multi-Stream Fused HMM for HCI
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Automatic 3D Face Modeling from Video
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Speech Animation Using Coupled Hidden Markov Models
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
Dynamic mapping method based speech driven face animation system
ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
A fused hidden Markov model with application to bimodal speech processing
IEEE Transactions on Signal Processing
Prosody conversion from neutral speech to emotional speech
IEEE Transactions on Audio, Speech, and Language Processing
Animating expressive faces across languages
IEEE Transactions on Multimedia
Speech-driven facial animation with realistic dynamics
IEEE Transactions on Multimedia
Audio/visual mapping with cross-modal hidden Markov models
IEEE Transactions on Multimedia
Learning dynamic audio-visual mapping with input-output Hidden Markov models
IEEE Transactions on Multimedia
Real-time speech-driven face animation with expressions using neural networks
IEEE Transactions on Neural Networks
Microintonation analysis of emotional speech
COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony
Hi-index | 0.00 |
This paper presents a realistic visual speech synthesis based on the hybrid concatenation method. Unlike previous methods based on phoneme level unit selection or hidden Markov model (HMM), etc., the hybrid concatenation method uses a frame level-based unit selection method combined with a fused HMM, and is able to generate more expressive and stable facial animations. The fused HMM can be used to explicitly model the loose synchronization of tightly coupled streams, with much better results than a normal HMM for audiovisual mapping. After fused HMM is created, facial animation is generated via the unit selection method at the frame level by using the fused HMM output probabilities. To accelerate the computing efficiency of the unit selection on a large corpus, this paper also proposes a two-layer Viterbi search method in which only the subsets that have been selected in the first layer are further checked in the second layer. Using this idea, the system has been successfully integrated into real-time applications. Furthermore, the paper also proposes a mapping method to generate emotional facial expressions from neutral facial expressions based on Gaussian mixture models (GMMs). Final experiments prove that the method described can output synthesized facial parameters with high quality. Compared with other audiovisual mapping methods, our method has better performance with respect to expressiveness, stability, and system running speed.