Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation

Authors:
Tatsuo Yotsukura;Shigeo Morishima;Satoshi Nakamura
Affiliations:
The authors are with ATR Spoken Language Communication Research Laboratories, Kyoto-fu, 619--0288 Japan. E-mail: tatsuo.yotsukura@atr.jp,;The authors are with ATR Spoken Language Communication Research Laboratories, Kyoto-fu, 619--0288 Japan. E-mail: tatsuo.yotsukura@atr.jp,;The authors are with ATR Spoken Language Communication Research Laboratories, Kyoto-fu, 619--0288 Japan. E-mail: tatsuo.yotsukura@atr.jp,
Venue:
IEICE - Transactions on Information and Systems
Year:
2005

Citing 0
Cited 3

Lip-sync animation from HMM using dynamic features

ACM SIGGRAPH 2006 Research posters
A comprehensive audio-visual corpus for teaching sound persian phoneme articulation

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
The persian linguistic based audio-visual data corpus, AVA II, considering coarticulation

MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling

Quantified Score

Hi-index	0.00

Visualization

Abstract

An accurate audio-visual speech corpus is inevitable for talking-heads research. This paper presents our audio-visual speech corpus collection and proposes a head-movement normalization method and a facial motion generation method. The audio-visual corpus contains speech data, movie data on faces, and positions and movements of facial organs. The corpus consists of Japanese phoneme-balanced sentences uttered by a female native speaker. An accurate facial capture is realized by using an optical motion-capture system. We captured high-resolution 3D data by arranging many markers on the speaker's face. In addition, we propose a method of acquiring the facial movements and removing head movements by using affine transformation for computing displacements of pure facial organs. Finally, in order to easily create facial animation from this motion data, we propose a technique assigning the captured data to the facial polygon model. Evaluation results demonstrate the effectiveness of the proposed facial motion generation method and show the relationship between the number of markers and errors.