Modeling Improved Prosody Generation from High-Level Linguistically Annotated Corpora

Authors:
Gerasimos Xydas;Dimitris Spiliotopoulos;Georgios Kouroupetroglou
Affiliations:
The authors are with the Department of Informatics and Telecommunications, University of Athens, Panepistimiopolis, Ilisia, GR-15784, Athens, Greece. E-mail: koupe@di.uoa.gr;The authors are with the Department of Informatics and Telecommunications, University of Athens, Panepistimiopolis, Ilisia, GR-15784, Athens, Greece. E-mail: koupe@di.uoa.gr;The authors are with the Department of Informatics and Telecommunications, University of Athens, Panepistimiopolis, Ilisia, GR-15784, Athens, Greece. E-mail: koupe@di.uoa.gr
Venue:
IEICE - Transactions on Information and Systems
Year:
2005

Citing 0
Cited 3

A Framework for Language-Independent Analysis and Prosodic Feature Annotation of Text Corpora

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Experimental evaluation of tree-based algorithms for intonational breaks representation

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Integrating contrast in a framework for predicting prosody

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Synthetic speech usually suffers from bad F0 contour surface. The prediction of the underlying pitch targets robustly relies on the quality of the predicted prosodic structures, i.e. the corresponding sequences of tones and breaks. In the present work, we have utilized a linguistically enriched annotated corpus to build data-driven models for predicting prosodic structures with increased accuracy. We have then used a linear regression approach for the F0 modeling. An appropriate XML annotation scheme has been introduced to encode syntax, grammar, new or already given information, phrase subject/object information, as well as rhetorical elements in the corpus, by exploiting a Natural Language Generator (NLG) system. To prove the benefits from the introduction of the enriched input meta-information, we first show that while tone and break CART predictors have high accuracy when standing alone (92.35% for breaks, 87.76% for accents and 99.03% for endtones), their application in the TtS chain degrades the Linear Regression pitch target model. On the other hand, the enriched linguistic meta-information minimizes errors of models leading to a more natural F0 surface. Both objective and subjective evaluation were adopted for the intonation contours by taking into account the propagated errors introduced by each model in the synthesis chain.