Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005

Authors:
Heiga Zen;Tomoki Toda;Masaru Nakamura;Keiichi Tokuda
Affiliations:
The authors are with the Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya-shi, 466--8555 Japan. E-mail: zen@ics.nitech.ac.jp, E-mail: masha@ics.nitech.ac.jp, ...;The author is with the Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma-shi, 630--0192 Japan. E-mail: tomoki@is.naist.jp;The authors are with the Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya-shi, 466--8555 Japan. E-mail: zen@ics.nitech.ac.jp, E-mail: masha@ics.nitech.ac.jp, ...;The authors are with the Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya-shi, 466--8555 Japan. E-mail: zen@ics.nitech.ac.jp, E-mail: masha@ics.nitech.ac.jp, ...
Venue:
IEICE - Transactions on Information and Systems
Year:
2007

Citing 0
Cited 21

The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006

IEICE - Transactions on Information and Systems
Review: Statistical parametric speech synthesis

Speech Communication
Integrating articulatory features into HMM-based parametric speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Robust speaker-adaptive HMM-based text-to-speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis

Speech Communication
Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech

Speech Communication
Thousands of voices for HMM-based speech synthesis: analysis and application of TTS systems built on various ASR corpora

IEEE Transactions on Audio, Speech, and Language Processing
Synthesis of child speech with HMM adaptation and voice conversion

IEEE Transactions on Audio, Speech, and Language Processing
Czech HMM-based speech synthesis

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
The Romanian speech synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate

Speech Communication
Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis

Speech Communication
Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis

Speech Communication
Effects of aging on the ability to benefit from prior knowledge of message content in masked speech recognition

Speech Communication
High quality emotional HMM-Based synthesis in spanish

NOLISP'09 Proceedings of the 2009 international conference on Advances in Nonlinear Speech Processing
Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping

Speech Communication
LSESpeak: A spoken language generator for Deaf people

Expert Systems with Applications: An International Journal
Expressive speech synthesis: a review

International Journal of Speech Technology
Complex cepstrum for statistical parametric speech synthesis

Speech Communication
Statistical parametric speech synthesis for Ibibio

Speech Communication
Predicting synthetic voice style from facial expressions. An application for augmented conversations

Speech Communication
Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In January 2005, an open evaluation of corpus-based text-to-speech synthesis systems using common speech datasets, named Blizzard Challenge 2005, was conducted. Nitech group participated in this challenge, entering an HMM-based speech synthesis system called Nitech-HTS 2005. This paper describes the technical details, building processes, and performance of our system. We first give an overview of the basic HMM-based speech synthesis system, and then describe new features integrated into Nitech-HTS 2005 such as STRAIGHT-based vocoding, HSMM-based acoustic modeling, and a speech parameter generation algorithm considering GV. Constructed Nitech-HTS 2005 voices can generate speech waveforms at 0.3 ×RT (real-time ratio) on a 1.6 GHz Pentium 4 machine, and footprints of these voices are less than 2 Mbytes. Subjective listening tests showed that the naturalness and intelligibility of the Nitech-HTS 2005 voices were much better than expected.