From text to speech: the MITalk system
From text to speech: the MITalk system
A model of segmental duration for speech synthesis in French
Speech Communication
Instance-Based Learning Algorithms
Machine Learning
Approximation and radial-basis-function networks
Neural Computation
Contextual effects on vowel duration
Speech Communication
The nature of statistical learning theory
The nature of statistical learning theory
Towards increasing speech recognition error rates
Speech Communication
Machine Learning
A TtS system for the Greek language based on concatenation of formant coded segments
Speech Communication
An introduction to text-to-speech synthesis
An introduction to text-to-speech synthesis
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
A perspective view and survey of meta-learning
Artificial Intelligence Review
Computational Statistics & Data Analysis - Nonlinear methods and data mining
Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees
The Journal of Machine Learning Research
Modeling durations of syllables using neural networks
Computer Speech and Language
Segmental Duration Modeling for Greek Speech Synthesis
ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Bayesian networks for phone duration prediction
Speech Communication
Phone duration modeling using gradient tree boosting
Speech Communication
Hi-index | 0.00 |
In the present work, we propose a scheme for the fusion of different phone duration models, operating in parallel. Specifically, the predictions from a group of dissimilar and independent to each other individual duration models are fed to a machine learning algorithm, which reconciles and fuses the outputs of the individual models, yielding more precise phone duration predictions. The performance of the individual duration models and of the proposed fusion scheme is evaluated on the American-English KED TIMIT and on the Greek WCL-1 databases. On both databases, the SVR-based individual model demonstrates the lowest error rate. When compared to the second-best individual algorithm, a relative reduction of the mean absolute error (MAE) and the root mean square error (RMSE) by 5.5% and 3.7% on KED TIMIT, and 6.8% and 3.7% on WCL-1 is achieved. At the fusion stage, we evaluate the performance of 12 fusion techniques. The proposed fusion scheme, when implemented with SVR-based fusion, contributes to the improvement of the phone duration prediction accuracy over the one of the best individual model, by 1.9% and 2.0% in terms of relative reduction of the MAE and RMSE on KED TIMIT, and by 2.6% and 1.8% on the WCL-1 database.