The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006

Authors:
Heiga Zen;Tomoki Toda;Keiichi Tokuda
Affiliations:
-;-;-
Venue:
IEICE - Transactions on Information and Systems
Year:
2008

Citing 11
Cited 1

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

Speech Communication
The SUS test: a method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences

Speech Communication
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
Voice Characteristics Conversion for HMM-based Speech Synthesis System

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Speech Synthesis with Various Emotional Expressions and Speaking Styles by Style Interpolation and Morphing

IEICE - Transactions on Information and Systems
Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005

IEICE - Transactions on Information and Systems
Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

IEICE - Transactions on Information and Systems
A Hidden Semi-Markov Model-Based Speech Synthesis System

IEICE - Transactions on Information and Systems

Robust speaker-adaptive HMM-based text-to-speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a statistical parametric speech synthesis system developed by a joint group from the Nagoya Institute of Technology (Nitech) and the Nara Institute of Science and Technology (NAIST) for the annual open evaluation of text-to-speech synthesis systems named Blizzard Challenge 2006. To improve our 2005 system (Nitech-HTS 2005), we investigated new features such as mel-generalized cepstrum-based line spectral pairs (MGC-LSPs), maximum likelihood linear transform (MLLT), and a full covariance global variance (GV) probability density function (pdf). A combination of mel-cepstral coefficients, MLLT, and full covariance GV pdf scored highest in subjective listening tests, and the 2006 system performed significantly better than the 2005 system. The Blizzard Challenge 2006 evaluations show that Nitech-NAIST-HTS 2006 is competitive even when working with relatively large speech databases.