Rules for the generation of ToBI-based American English intonation
Speech Communication
Generating prosodic attitudes in French: data, model and evaluation
Speech Communication
A fuzzy decision tree-based duration model for Standard Yorùbá text-to-speech synthesis
Computer Speech and Language
Applying data mining techniques to corpus based prosodic modeling
Speech Communication
A modular holistic approach to prosody modelling for Standard Yorùbá speech synthesis
Computer Speech and Language
Hi-index | 0.00 |
This paper introduces a novel model-constrained, data-driven method to generate fundamental frequency contours for Japanese text-to-speech synthesis. In the training phase, the relationship between linguistic features and the parameters of a command-response F0 contour generation model is learned by a prediction module, which is represented by either a neural network or a set of binary regression trees. Input features consist of linguistic information related to accentual phrases that can be automatically derived from text, such as the position of the accentual phrase in the utterance, number of morae, accent type, and morphological information. In the synthesis phase, the prediction module is used to generate appropriate values of model parameters. The use of the parametric model restricts the degrees of freedom of the problem to facilitate the mapping between linguistic and prosodic features. Experimental results show that the method makes it possible to generate quite natural F0 contours with a relatively small training database.