The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006
IEICE - Transactions on Information and Systems
Review: Statistical parametric speech synthesis
Speech Communication
Integrating articulatory features into HMM-based parametric speech synthesis
IEEE Transactions on Audio, Speech, and Language Processing
Robust speaker-adaptive HMM-based text-to-speech synthesis
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
Synthesis of child speech with HMM adaptation and voice conversion
IEEE Transactions on Audio, Speech, and Language Processing
Czech HMM-based speech synthesis
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
High quality emotional HMM-Based synthesis in spanish
NOLISP'09 Proceedings of the 2009 international conference on Advances in Nonlinear Speech Processing
LSESpeak: A spoken language generator for Deaf people
Expert Systems with Applications: An International Journal
Expressive speech synthesis: a review
International Journal of Speech Technology
Complex cepstrum for statistical parametric speech synthesis
Speech Communication
Statistical parametric speech synthesis for Ibibio
Speech Communication
Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis
Journal of Signal Processing Systems
Hi-index | 0.00 |
In January 2005, an open evaluation of corpus-based text-to-speech synthesis systems using common speech datasets, named Blizzard Challenge 2005, was conducted. Nitech group participated in this challenge, entering an HMM-based speech synthesis system called Nitech-HTS 2005. This paper describes the technical details, building processes, and performance of our system. We first give an overview of the basic HMM-based speech synthesis system, and then describe new features integrated into Nitech-HTS 2005 such as STRAIGHT-based vocoding, HSMM-based acoustic modeling, and a speech parameter generation algorithm considering GV. Constructed Nitech-HTS 2005 voices can generate speech waveforms at 0.3 ×RT (real-time ratio) on a 1.6 GHz Pentium 4 machine, and footprints of these voices are less than 2 Mbytes. Subjective listening tests showed that the naturalness and intelligibility of the Nitech-HTS 2005 voices were much better than expected.