Data-driven generation of F0 contours using a superpositional model

Authors:
A. Sakurai;K. Hirose;N. Minematsu
Affiliations:
DCES Software Laboratory, Texas Instruments Japan, Miyukigaoka 17, Tsukuba, Ibaraki 305-0841, Japan;Graduate School of Frontier Sciences, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan;Graduate School of Information Science and Technology, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan
Venue:
Speech Communication
Year:
2003

Citing 2
Cited 4

Rules for the generation of ToBI-based American English intonation

Speech Communication
Generating prosodic attitudes in French: data, model and evaluation

Speech Communication

A fuzzy decision tree-based duration model for Standard Yorùbá text-to-speech synthesis

Computer Speech and Language
Applying data mining techniques to corpus based prosodic modeling

Speech Communication
A modular holistic approach to prosody modelling for Standard Yorùbá speech synthesis

Computer Speech and Language
Toward invariant functional representations of variable surface fundamental frequency contours: Synthesizing speech melody via model-based stochastic learning

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a novel model-constrained, data-driven method to generate fundamental frequency contours for Japanese text-to-speech synthesis. In the training phase, the relationship between linguistic features and the parameters of a command-response F0 contour generation model is learned by a prediction module, which is represented by either a neural network or a set of binary regression trees. Input features consist of linguistic information related to accentual phrases that can be automatically derived from text, such as the position of the accentual phrase in the utterance, number of morae, accent type, and morphological information. In the synthesis phase, the prediction module is used to generate appropriate values of model parameters. The use of the parametric model restricts the degrees of freedom of the problem to facilitate the mapping between linguistic and prosodic features. Experimental results show that the method makes it possible to generate quite natural F0 contours with a relatively small training database.