Data-driven generation of F0 contours using a superpositional model

  • Authors:
  • A. Sakurai;K. Hirose;N. Minematsu

  • Affiliations:
  • DCES Software Laboratory, Texas Instruments Japan, Miyukigaoka 17, Tsukuba, Ibaraki 305-0841, Japan;Graduate School of Frontier Sciences, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan;Graduate School of Information Science and Technology, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan

  • Venue:
  • Speech Communication
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a novel model-constrained, data-driven method to generate fundamental frequency contours for Japanese text-to-speech synthesis. In the training phase, the relationship between linguistic features and the parameters of a command-response F0 contour generation model is learned by a prediction module, which is represented by either a neural network or a set of binary regression trees. Input features consist of linguistic information related to accentual phrases that can be automatically derived from text, such as the position of the accentual phrase in the utterance, number of morae, accent type, and morphological information. In the synthesis phase, the prediction module is used to generate appropriate values of model parameters. The use of the parametric model restricts the degrees of freedom of the problem to facilitate the mapping between linguistic and prosodic features. Experimental results show that the method makes it possible to generate quite natural F0 contours with a relatively small training database.