Two-stage phone duration modelling with feature construction and feature vector extension for the needs of speech synthesis

Authors:
Alexandros Lazaridis;Todor Ganchev;Iosif Mporas;Evaggelos Dermatas;Nikos Fakotakis
Affiliations:
Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece
Venue:
Computer Speech and Language
Year:
2012

Citing 31
Cited 1

A model of segmental duration for speech synthesis in French

Speech Communication
Instance-Based Learning Algorithms

Machine Learning
A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Approximation and radial-basis-function networks

Neural Computation
Hypothesis-Driven Constructive Induction in AQ17-HCI: A Method and Experiments

Machine Learning - Special issue on evaluating and changing representation
The nature of statistical learning theory

The nature of statistical learning theory
Bagging predictors

Machine Learning
An introduction to text-to-speech synthesis

An introduction to text-to-speech synthesis
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Tree-based modeling of prosodic phrasing and segmental duration for Korean TTS systems

Speech Communication
A Theoretical Study on Six Classifier Fusion Strategies

IEEE Transactions on Pattern Analysis and Machine Intelligence
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
A perspective view and survey of meta-learning

Artificial Intelligence Review
Stochastic gradient boosting

Computational Statistics & Data Analysis - Nonlinear methods and data mining
Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees

The Journal of Machine Learning Research
Modeling durations of syllables using neural networks

Computer Speech and Language
Segmental Duration Modeling for Greek Speech Synthesis

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Bayesian networks for phone duration prediction

Speech Communication
Phone duration modeling using gradient tree boosting

Speech Communication
Iterative feature construction for improving inductive learning algorithms

Expert Systems with Applications: An International Journal
Speech segmentation using regression fusion of boundary predictions

Computer Speech and Language
Constructive induction on decision trees

IJCAI'89 Proceedings of the 11th international joint conference on Artificial intelligence - Volume 1
Complex concept acquisition through directed search and feature caching

IJCAI'93 Proceedings of the 13th international joint conference on Artifical intelligence - Volume 2
Constructing nominal X-of-N attributes

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Improving phone duration modelling using support vector regression fusion

Speech Communication
Generation of attributes for learning algorithms

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
On the mean accuracy of statistical pattern recognizers

IEEE Transactions on Information Theory
Improving model accuracy using optimal linear combinations of trained neural networks

IEEE Transactions on Neural Networks

A feature construction approach for genetic iterative rule learning algorithm

Journal of Computer and System Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a two-stage phone duration modelling scheme, which can be applied for the improvement of prosody modelling in speech synthesis systems. This scheme builds on a number of independent feature constructors (FCs) employed in the first stage, and a phone duration model (PDM) which operates on an extended feature vector in the second stage. The feature vector, which acts as input to the first stage, consists of numerical and non-numerical linguistic features extracted from text. The extended feature vector is obtained by appending the phone duration predictions estimated by the FCs to the initial feature vector. Experiments on the American-English KED TIMIT and on the Modern Greek WCL-1 databases validated the advantage of the proposed two-stage scheme, improving prediction accuracy over the best individual predictor, and over a two-stage scheme which just fuses the first-stage outputs. Specifically, when compared to the best individual predictor, a relative reduction in the mean absolute error and the root mean square error of 3.9% and 3.9% on the KED TIMIT, and of 4.8% and 4.6% on the WCL-1 database, respectively, is observed.