A Hidden Semi-Markov Model-Based Speech Synthesis System

Authors:
Heiga Zen;Keiichi Tokuda;Takashi Masuko;Takao Kobayasih;Tadashi Kitamura
Affiliations:
-;-;-;-;-
Venue:
IEICE - Transactions on Information and Systems
Year:
2007

Citing 6
Cited 15

Continuously variable duration hidden Markov models for automatic speech recognition

Computer Speech and Language
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
Voice Characteristics Conversion for HMM-based Speech Synthesis System

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Speech synthesis using HMMs with dynamic features

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02

The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006

IEICE - Transactions on Information and Systems
A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System

IEICE - Transactions on Information and Systems
Review: Statistical parametric speech synthesis

Speech Communication
Robust speaker-adaptive HMM-based text-to-speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis

Speech Communication
Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech

Speech Communication
Thousands of voices for HMM-based speech synthesis: analysis and application of TTS systems built on various ASR corpora

IEEE Transactions on Audio, Speech, and Language Processing
Brief communication: Computation of mutual information from Hidden Markov Models

Computational Biology and Chemistry
The Romanian speech synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate

Speech Communication
Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency

Speech Communication
A tone-modeling technique using a quantized F0 context to improve tone correctness in average-voice-based speech synthesis

Speech Communication
Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping

Speech Communication
Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis

Speech Communication
Analysis and HMM-based synthesis of hypo and hyperarticulated speech

Computer Speech and Language
HMM-based speech synthesis with various degrees of articulation: A perceptual study

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A statistical speech synthesis system based on the hidden Markov model (HMM) was recently proposed. In this system, spectrum, excitation, and duration of speech are modeled simultaneously by context-dependent HMMs, and speech parameter vector sequences are generated from the HMMs themselves. This system defines a speech synthesis problem in a generative model framework and solves it based on the maximum likelihood (ML) criterion. However, there is an inconsistency: although state duration probability density functions (PDFs) are explicitly used in the synthesis part of the system, they have not been incorporated into its training part. This inconsistency can make the synthesized speech sound less natural. In this paper, we propose a statistical speech synthesis system based on a hidden semi-Markov model (HSMM), which can be viewed as an HMM with explicit state duration PDFs. The use of HSMMs can solve the above inconsistency because we can incorporate the state duration PDFs explicitly into both the synthesis and the training parts of the system. Subjective listening test results show that use of HSMMs improves the reported naturalness of synthesized speech.