A tone-modeling technique using a quantized F0 context to improve tone correctness in average-voice-based speech synthesis

Authors:
Vataya Chunwijitra;Takashi Nose;Takao Kobayashi
Affiliations:
Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama 226-8502, Japan;Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama 226-8502, Japan;Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama 226-8502, Japan
Venue:
Speech Communication
Year:
2012

Citing 12
Cited 1

Optimal Partitioning for Classification and Regression Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
The SUS test: a method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences

Speech Communication
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Speech Representation and Transformation IJsing Adaptive Interpolation of Weighted Spectrum: VOCODER Revisited

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Average-Voice-Based Speech Synthesis Using HSMM-Based Speaker Adaptation and Adaptive Training

IEICE - Transactions on Information and Systems
Unit selection in a concatenative speech synthesis system using a large speech database

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Hidden Markov models based on multi-space probability distribution for pitch pattern modeling

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Tone correctness improvement in speaker dependent HMM-based Thai speech synthesis

Speech Communication
Tone correctness improvement in speaker-independent average-voice-based Thai speech synthesis

Speech Communication
A Hidden Semi-Markov Model-Based Speech Synthesis System

IEICE - Transactions on Information and Systems
Thousands of voices for HMM-based speech synthesis: analysis and application of TTS systems built on various ASR corpora

IEEE Transactions on Audio, Speech, and Language Processing

Statistical parametric speech synthesis for Ibibio

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a technique of improving tone correctness in speech synthesis of a tonal language based on an average-voice model trained with a corpus from nonprofessional speakers' speech. We focused on reducing tone disagreements in speech data acquired from nonprofessional speakers without manually modifying the labels. To reduce the distortion in tone caused by inconsistent tonal labeling, quantized F0 symbols were utilized as the context for F0 to obtain an appropriate F0 model. With this technique, the tonal context could be directly extracted from the original speech and this prevented inconsistency between speech data and F0 labels generated from transcriptions, which affect naturalness and the tone correctness in synthetic speech. We examined two types of labeling for the tonal context using phone-based and sub-phone-based quantized F0 symbols. Subjective and objective evaluations of the synthetic voice were carried out in terms of the intelligibility of tone and its naturalness. The experimental results from both the objective and subjective tests revealed that the proposed technique could improve not only naturalness but also the tone correctness of synthetic speech under conditions where a small amount of speech data from nonprofessional target speakers was used.