Lip movement synthesis from speech based on hidden Markov models
Speech Communication - Special issue on auditory-visual speech processing
Audio-Visual Interaction in Multimedia Communication
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Cross-modal prediction in audio-visual communication
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 04
SynFace: speech-driven facial animation for virtual speech-reading support
EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Hi-index | 0.02 |
In this paper, we investigate a Hidden Markov Model (HMM)-based method to drive a lip movement sequence with input speech. In a previous study, we have already investigated a mapping method based on the Viterbi decoding algorithm which converts an input speech signal to a lip movement sequence through the most likely HMM state sequence using audio HMMs. However, the method can result in errors due to incorrectly decoded HMM states. This paper proposes a method to re-estimate visual parameters using HMMs of audio-visual joint probability using the Expectation-Maximization (EM) algorithm. In the experiments, the proposed mapping method results in a 26% error reduction when compared to the Viterbi-based algorithm at incorrectly decoded bilabial consonants.