Improving phone duration modelling using support vector regression fusion

Authors:
Alexandros Lazaridis;Iosif Mporas;Todor Ganchev;George Kokkinakis;Nikos Fakotakis
Affiliations:
Artificial Intelligence Group, Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;Artificial Intelligence Group, Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;Artificial Intelligence Group, Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;Artificial Intelligence Group, Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;Artificial Intelligence Group, Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece
Venue:
Speech Communication
Year:
2011

Citing 22
Cited 1

From text to speech: the MITalk system

From text to speech: the MITalk system
A model of segmental duration for speech synthesis in French

Speech Communication
Instance-Based Learning Algorithms

Machine Learning
Approximation and radial-basis-function networks

Neural Computation
Contextual effects on vowel duration

Speech Communication
The nature of statistical learning theory

The nature of statistical learning theory
Towards increasing speech recognition error rates

Speech Communication
Bagging predictors

Machine Learning
Modelling of phone duration (using the TIMIT database) and its potential benefit for ASR

Speech Communication
A TtS system for the Greek language based on concatenation of formant coded segments

Speech Communication
An introduction to text-to-speech synthesis

An introduction to text-to-speech synthesis
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
A perspective view and survey of meta-learning

Artificial Intelligence Review
Stochastic gradient boosting

Computational Statistics & Data Analysis - Nonlinear methods and data mining
Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees

The Journal of Machine Learning Research
Modeling durations of syllables using neural networks

Computer Speech and Language
Segmental Duration Modeling for Greek Speech Synthesis

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Bayesian networks for phone duration prediction

Speech Communication
Phone duration modeling using gradient tree boosting

Speech Communication

Two-stage phone duration modelling with feature construction and feature vector extension for the needs of speech synthesis

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the present work, we propose a scheme for the fusion of different phone duration models, operating in parallel. Specifically, the predictions from a group of dissimilar and independent to each other individual duration models are fed to a machine learning algorithm, which reconciles and fuses the outputs of the individual models, yielding more precise phone duration predictions. The performance of the individual duration models and of the proposed fusion scheme is evaluated on the American-English KED TIMIT and on the Greek WCL-1 databases. On both databases, the SVR-based individual model demonstrates the lowest error rate. When compared to the second-best individual algorithm, a relative reduction of the mean absolute error (MAE) and the root mean square error (RMSE) by 5.5% and 3.7% on KED TIMIT, and 6.8% and 3.7% on WCL-1 is achieved. At the fusion stage, we evaluate the performance of 12 fusion techniques. The proposed fusion scheme, when implemented with SVR-based fusion, contributes to the improvement of the phone duration prediction accuracy over the one of the best individual model, by 1.9% and 2.0% in terms of relative reduction of the MAE and RMSE on KED TIMIT, and by 2.6% and 1.8% on the WCL-1 database.