Multi-speaker articulatory trajectory formation based on speaker-independent articulatory HMMs

Authors:
Sadao Hiroya;Takemi Mochida
Affiliations:
NTT Communication Science Laboratories, NTT Corporation, 3-1 Morinosato-Wakamiya, Atsugi-shi, Kanagawa 243-0198, Japan;NTT Communication Science Laboratories, NTT Corporation, 3-1 Morinosato-Wakamiya, Atsugi-shi, Kanagawa 243-0198, Japan
Venue:
Speech Communication
Year:
2006

Citing 4
Cited 2

Experiments in Speaker Normalisation and Adaptation for Large Vocabulary Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Tree-based state tying for high accuracy acoustic modelling

HLT '94 Proceedings of the workshop on Human Language Technology
A parametric approach to vocal tract length normalization

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Speaker normalization using efficient frequency warping procedures

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01

Image and video for hearing impaired people

Journal on Image and Video Processing
An Analysis of HMM-based prediction of articulatory movements

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Inter-speaker variability in the speech spectrum domain has been modeled using speaker-adaptive training (SAT), in which speaker-independent phoneme-specific hidden Markov models (HMMs) were used along with a speaker-adaptive matrix. In this paper, multi-speaker articulatory trajectory formation based on this method is presented. Both speaker-independent and speaker-specific features are statistically separated from a multi-speaker articulatory database, which consists of the mid-sagittal motion data of the lips, incisor, and tongue measured with an electro-magnetic articulographic (EMA) system. We evaluated the proposed method in terms of the RMS error between the measured and estimated articulatory parameters. When multi-speaker models of articulatory parameters with two speaker-adaptive matrices for each speaker were used, the average RMS error of articulatory parameters was 1.29mm and showed no statistically significant difference from that for speaker-dependent models (1.22mm). For comparison, multi-speaker models of the conventional speech spectrum were also constructed using a multi-speaker spectrum database, which consists of speech data simultaneously recorded during the articulatory measurements. The average spectral distance between the vocal-tract and estimated spectrum from two-matrix models was 4.19dB and showed a statistically significant difference from that for speaker-dependent models (3.97dB). These results indicate that modeling of inter-speaker variability in the articulatory parameter domain with a small number of matrices for each speaker almost perfectly approximates the speaker dependency of articulation and is better than that in the speech spectrum domain.