Robust speaker-adaptive HMM-based text-to-speech synthesis

Authors:
Junichi Yamagishi;Takashi Nose;Heiga Zen;Zhen-Hua Ling;Tomoki Toda;Keiichi Tokuda;Simon King;Steve Renals
Affiliations:
Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK;Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama, Japan;Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya, Japan;iFlytek Speech Lab, University of Science and Technology of China, Hefei, Anhui, China;Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan;Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya, Japan;Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK;Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 18
Cited 13

Continuously variable duration hidden Markov models for automatic speech recognition

Computer Speech and Language
Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

Speech Communication
The SUS test: a method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences

Speech Communication
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis

IEICE - Transactions on Information and Systems
A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features

IEICE - Transactions on Information and Systems
Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005

IEICE - Transactions on Information and Systems
Average-Voice-Based Speech Synthesis Using HSMM-Based Speaker Adaptation and Adaptive Training

IEICE - Transactions on Information and Systems
State Duration Modeling for HMM-Based Speech Synthesis

IEICE - Transactions on Information and Systems
Multisyn: Open-domain unit selection for the Festival speech synthesis system

Speech Communication
Unit selection in a concatenative speech synthesis system using a large speech database

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Speech synthesis using HMMs with dynamic features

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

IEICE - Transactions on Information and Systems
A Hidden Semi-Markov Model-Based Speech Synthesis System

IEICE - Transactions on Information and Systems
The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006

IEICE - Transactions on Information and Systems
HMM-based emotional speech synthesis using average emotion model

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

IEEE Transactions on Audio, Speech, and Language Processing

Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis

Speech Communication
Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech

Speech Communication
Thousands of voices for HMM-based speech synthesis: analysis and application of TTS systems built on various ASR corpora

IEEE Transactions on Audio, Speech, and Language Processing
Personalising speech-to-speech translation in the EMIME project

ACLDemos '10 Proceedings of the ACL 2010 System Demonstrations
Exploiting prosody hierarchy and dynamic features for pitch modeling and generation in HMM-based speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Perceptual effects of the degree of articulation in HMM-based speech synthesis

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis

Computer Speech and Language
LSESpeak: A spoken language generator for Deaf people

Expert Systems with Applications: An International Journal
Evaluating the intelligibility benefit of speech modifications in known noise conditions

Speech Communication
Statistical parametric speech synthesis for Ibibio

Speech Communication
Analysis and HMM-based synthesis of hypo and hyperarticulated speech

Computer Speech and Language
Synthesis of Spontaneous Speech With Syllable Contraction Using State-Based Context-Dependent Voice Transformation

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
HMM-based speech synthesis with various degrees of articulation: A perceptual study

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a speaker-adaptive HMM-based speech synthesis system. The new system, called "HTS-2007," employs speaker adaptation (CSMAPLR+MAP), feature-space adaptive training, mixed-gender modeling, and full-covariance modeling using CSMAPLRtransforms, in addition to several other techniques that have proved effective in our previous systems. Subjective evaluation results show that the new system generates significantly better quality synthetic speech than speaker-dependent approaches with realistic amounts of speech data, and that it bears comparison with speaker-dependent approaches even when large amounts of speech data are available. In addition, a comparison study with several speech synthesis techniques shows the new system is very robust: It is able to build voices from less-than-ideal speech data and synthesize good-quality speech even for out-of-domain sentences.