A fuzzy decision tree-based duration model for Standard Yorùbá text-to-speech synthesis

Authors:
dtúnjí A. djbí;Shun Ha Sylvia Wong;Anthony J. Beaumont
Affiliations:
Computer Science, Aston University, Aston Triangle, Birmingham B4 7ET, UK and Room 109, Computer Buildings, Computer Science and Engineering Department, báfmi Awlw` University, Ilé-If`, ...;Computer Science, Aston University, Aston Triangle, Birmingham B4 7ET, UK;Computer Science, Aston University, Aston Triangle, Birmingham B4 7ET, UK
Venue:
Computer Speech and Language
Year:
2007

Citing 19
Cited 2

Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

Speech Communication
Contextual effects on vowel duration

Speech Communication
Segment and syllable duration in Australian English

Speech Communication - Speech science and technology: a selection from the papers presented at the Fourth International Conference in Speech Science and Technology (SST-92)
Fuzzy Systems as Universal Approximators

IEEE Transactions on Computers
Characterisation of rhythmic patterns for text-to-speech synthesis

Speech Communication
Induction of fuzzy decision trees

Fuzzy Sets and Systems
Automatic induction of fuzzy decision trees and its application to power system security assessment

Fuzzy Sets and Systems - Special issue on applications of fuzzy theory in electronic power systems
Globally Optimal Fuzzy Decision Trees for Classification and Regression

IEEE Transactions on Pattern Analysis and Machine Intelligence
Tree-based modeling of prosodic phrasing and segmental duration for Korean TTS systems

Speech Communication
Strategy-based decision making of a soccer robot system using a real-time self-organizing fuzzy decision tree

Fuzzy Sets and Systems - Special issue: Approximate Reasoning in Words
Induction of Decision Trees

Machine Learning
Data-driven generation of F0 contours using a superpositional model

Speech Communication
A complete fuzzy decision tree technique

Fuzzy Sets and Systems - Theme: Learning and modeling
Statistics for Business and Economics (with Student CD-ROM, iPod Key Term, and InfoTrac )

Statistics for Business and Economics (with Student CD-ROM, iPod Key Term, and InfoTrac )
Representation of Random Waveforms by Relational Trees

IEEE Transactions on Computers
Fuzzy decision tree, linguistic rules and fuzzy knowledge-based network: generation and evaluation

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Fuzzy decision trees: issues and methods

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Look-ahead based fuzzy decision tree induction

IEEE Transactions on Fuzzy Systems
The shape of fuzzy sets in adaptive function approximation

IEEE Transactions on Fuzzy Systems

A modular holistic approach to prosody modelling for Standard Yorùbá speech synthesis

Computer Speech and Language
Automatic recognition of oral vowels in tone language: Experiments with fuzzy logic and neural network models

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present syllable-based duration modelling in the context of a prosody model for Standard Yoruba (SY) text-to-speech (TTS) synthesis applications. Our prosody model is conceptualised around a modular holistic framework. This framework is implemented using the Relational Tree (R-Tree) techniques. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration, intonation, and intensity, using different techniques and their subsequent integration. We applied the Fuzzy Decision Tree (FDT) technique to model the duration dimension. In order to evaluate the effectiveness of FDT in duration modelling, we have also developed a Classification And Regression Tree (CART) based duration model using the same speech data. Each of these models was integrated into our R-Tree based prosody model. We performed both quantitative (i.e. Root Mean Square Error (RMSE) and Correlation (Corr)) and qualitative (i.e. intelligibility and naturalness) evaluations on the two duration models. The results show that CART models the training data more accurately than FDT. The FDT model, however, shows a better ability to extrapolate from the training data since it achieved a better accuracy for the test data set. Our qualitative evaluation results show that our FDT model produces synthesised speech that is perceived to be more natural than our CART model. In addition, we also observed that the expressiveness of FDT is much better than that of CART. That is because the representation in FDT is not restricted to a set of piece-wise or discrete constant approximation. We, therefore, conclude that the FDT approach is a practical approach for duration modelling in SY TTS applications.