Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005

  • Authors:
  • Heiga Zen;Tomoki Toda;Masaru Nakamura;Keiichi Tokuda

  • Affiliations:
  • The authors are with the Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya-shi, 466--8555 Japan. E-mail: zen@ics.nitech.ac.jp, E-mail: masha@ics.nitech.ac.jp, ...;The author is with the Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma-shi, 630--0192 Japan. E-mail: tomoki@is.naist.jp;The authors are with the Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya-shi, 466--8555 Japan. E-mail: zen@ics.nitech.ac.jp, E-mail: masha@ics.nitech.ac.jp, ...;The authors are with the Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya-shi, 466--8555 Japan. E-mail: zen@ics.nitech.ac.jp, E-mail: masha@ics.nitech.ac.jp, ...

  • Venue:
  • IEICE - Transactions on Information and Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In January 2005, an open evaluation of corpus-based text-to-speech synthesis systems using common speech datasets, named Blizzard Challenge 2005, was conducted. Nitech group participated in this challenge, entering an HMM-based speech synthesis system called Nitech-HTS 2005. This paper describes the technical details, building processes, and performance of our system. We first give an overview of the basic HMM-based speech synthesis system, and then describe new features integrated into Nitech-HTS 2005 such as STRAIGHT-based vocoding, HSMM-based acoustic modeling, and a speech parameter generation algorithm considering GV. Constructed Nitech-HTS 2005 voices can generate speech waveforms at 0.3 ×RT (real-time ratio) on a 1.6 GHz Pentium 4 machine, and footprints of these voices are less than 2 Mbytes. Subjective listening tests showed that the naturalness and intelligibility of the Nitech-HTS 2005 voices were much better than expected.