Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

Authors:
J. Yamagishi;T. Kobayashi;Y. Nakano;K. Ogata;J. Isogai
Affiliations:
Center for Speech Technol. Res., Univ. of Edinburgh, Edinburgh;-;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 0
Cited 17

Review: Statistical parametric speech synthesis

Speech Communication
Integrating articulatory features into HMM-based parametric speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Robust speaker-adaptive HMM-based text-to-speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis

Speech Communication
Voice conversion using partial least squares regression

IEEE Transactions on Audio, Speech, and Language Processing
Voice conversion based on weighted frequency warping

IEEE Transactions on Audio, Speech, and Language Processing
Thousands of voices for HMM-based speech synthesis: analysis and application of TTS systems built on various ASR corpora

IEEE Transactions on Audio, Speech, and Language Processing
Synthesis of child speech with HMM adaptation and voice conversion

IEEE Transactions on Audio, Speech, and Language Processing
Improvements of Hungarian hidden Markov model-based text-to-speech synthesis

Acta Cybernetica
Czech HMM-based speech synthesis: experiments with model adaptation

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Voice banking and voice reconstruction for MND patients

The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility
Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping

Speech Communication
Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis

Computer Speech and Language
Evaluating the intelligibility benefit of speech modifications in known noise conditions

Speech Communication
Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise

Computer Speech and Language
Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion

Computer Speech and Language
Synthesis of Spontaneous Speech With Syllable Contraction Using State-Based Context-Dependent Voice Transformation

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we analyze the effects of several factors and configuration choices encountered during training and model construction when we want to obtain better and more stable adaptation in HMM-based speech synthesis. We then propose a new adaptation algorithm called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms. Here, we investigate six major aspects of the speaker adaptation: initial models; the amount of the training data for the initial models; the transform functions, estimation criteria, and sensitivity of several linear regression adaptation algorithms; and combination algorithms. Analyzing the effect of the initial model, we compare speaker-dependent models, gender-independent models, and the simultaneous use of the gender-dependent models to single use of the gender-dependent models. Analyzing the effect of the transform functions, we compare the transform function for only mean vectors with that for mean vectors and covariance matrices. Analyzing the effect of the estimation criteria, we compare the ML criterion with a robust estimation criterion called structural MAP. We evaluate the sensitivity of several thresholds for the piecewise linear regression algorithms and take up methods combining MAP adaptation with the linear regression algorithms. We incorporate these adaptation algorithms into our speech synthesis system and present several subjective and objective evaluation results showing the utility and effectiveness of these algorithms in speaker adaptation for HMM-based speech synthesis.