Robust speaker-adaptive HMM-based text-to-speech synthesis

  • Authors:
  • Junichi Yamagishi;Takashi Nose;Heiga Zen;Zhen-Hua Ling;Tomoki Toda;Keiichi Tokuda;Simon King;Steve Renals

  • Affiliations:
  • Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK;Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama, Japan;Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya, Japan;iFlytek Speech Lab, University of Science and Technology of China, Hefei, Anhui, China;Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan;Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya, Japan;Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK;Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK

  • Venue:
  • IEEE Transactions on Audio, Speech, and Language Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a speaker-adaptive HMM-based speech synthesis system. The new system, called "HTS-2007," employs speaker adaptation (CSMAPLR+MAP), feature-space adaptive training, mixed-gender modeling, and full-covariance modeling using CSMAPLRtransforms, in addition to several other techniques that have proved effective in our previous systems. Subjective evaluation results show that the new system generates significantly better quality synthetic speech than speaker-dependent approaches with realistic amounts of speech data, and that it bears comparison with speaker-dependent approaches even when large amounts of speech data are available. In addition, a comparison study with several speech synthesis techniques shows the new system is very robust: It is able to build voices from less-than-ideal speech data and synthesize good-quality speech even for out-of-domain sentences.