Evaluation of automatic break insertion for an agglutinative and inflected language

Authors:
Eva Navas;Inmaculada Hernáez;Iñaki Sainz
Affiliations:
Departamento de Electrónica y Telecomunicaciones, University of the Basque Country, Alda. Urquijo s/n, 48013 Bilbao, Spain;Departamento de Electrónica y Telecomunicaciones, University of the Basque Country, Alda. Urquijo s/n, 48013 Bilbao, Spain;Departamento de Electrónica y Telecomunicaciones, University of the Basque Country, Alda. Urquijo s/n, 48013 Bilbao, Spain
Venue:
Speech Communication
Year:
2008

Citing 15
Cited 0

From text to speech: the MITalk system

From text to speech: the MITalk system
A computational grammar of discourse-neutral prosodic phrasing in English

Computational Linguistics
Pauses and the temporal structure of speech

Fundamentals of speech synthesis and speech recognition
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Training intonational phrasing rules automatically for English and Spanish text-to-speech

Speech Communication
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Tree-based modeling of prosodic phrasing and segmental duration for Korean TTS systems

Speech Communication
A hierarchical stochastic model for automatic prediction of prosodic boundary location

Computational Linguistics
Combining stochastic and rule-based methods for disambiguation in agglutinative languages

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Learning to predict pitch accents and prosodic boundaries in Dutch

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
New statistical methods for phrase break prediction

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Stochastic and syntactic techniques for predicting phrase breaks

Computer Speech and Language
A prosodic phrasing model for a Korean text-to-speech synthesis system

Computer Speech and Language
Experimental evaluation of tree-based algorithms for intonational breaks representation

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Robust rule-based method for automatic break assignment in russian texts

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the evaluation of automatic break insertion for standard Basque. Basque is an agglutinative and inflected language and POS features, widely used for other languages, are not enough to accurately predict the insertion of breaks in the text. Other morpho-syntactic features, like grammatical case and information about syntagms have also been taken into account. With a textual corpus specially gathered for this study where the sentence internal punctuation marks have been removed, CARTs have been used to predict break locations. After applying parameter selection to the whole morpho-syntactic feature set, the best features were employed to build two CARTs, one that gives the same importance to deletion and insertion errors, T1, and another one, T2, that tries to minimize insertion errors. The objective evaluation of the break insertion algorithms gives a @k statistic of 0.518 and an F of 0.757 for T1 tree. The algorithms have also been subjectively evaluated and although T1 had better objective measures, the number of serious errors made by this tree is larger than the number of serious errors made by T2.