Viseme Recognition Experiment Using Context Dependent Hidden Markov Models

Authors:
Soonkyu Lee;Dongsuk Yook
Affiliations:
-;-
Venue:
IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Year:
2002

Citing 1
Cited 1

Adaptive fusion of acoustic and visual sources for automatic speech recognition

Speech Communication - Special issue on auditory-visual speech processing

Speech driven facial animation using HMMs in basque

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

Visual images synchronized with audio signals can provide user-friendly interface for man machine interactions. The visual speech can be represented as a sequence of visemes, which are the generic face images corresponding to particular sounds. We use HMMs (hidden Markov models) to convert audio signals to a sequence of visemes. In this paper, we compare two approaches in using HMMs. In the first approach, an HMM is trained for each triviseme which is a viseme with its left and right context, and the audio signals are directly recognized as a sequence of trivisemes. In the second approach, each triphone is modeled with an HMM, and a general triphone recognizer is used to produce a triphone sequence from the audio signals. The triviseme or triphone sequence is then converted to a viseme sequence. The performances of the two viseme recognition systems are evaluated on the TIMIT speech corpus.