Automatic accentuation of words for Slovenian TTS system

Authors:
Tomaž Šef
Affiliations:
Department of Intelligent Systems, Jožef Stefan Institute, Ljubljana, Slovenia
Venue:
SIP'06 Proceedings of the 5th WSEAS international conference on Signal processing
Year:
2006

Citing 3
Cited 0

A Comparison of ID3 and Backpropagation for English Text-To-Speech Mapping

Machine Learning
Multilingual Text-to-Speech Synthesis

Multilingual Text-to-Speech Synthesis
Induction of Decision Trees

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The accentuation of unknown Slovene words represents a challenging task for automated solvers since in Slovenian, stress can be located on arbitrary syllables. Most words have only one stressed syllable, but there exist also words with no stress and words with more than one stress. Furthermore, different forms of the same word can be stressed differently. In this paper, we present a two level lexical stress assignment model for out of vocabulary Slovenian words used in our text-to-speech system. First, each vowel is determined, whether it is stressed or unstressed, and a type of lexical stress is assigned for every stressed vowel. Then, some corrections are made on the word level, according the number of stressed vowels and the length of the word. We applied a machine-learning technique (decision trees or boosted decision trees). The accuracy achieved by decision trees significantly outperforms all previous results. However, the sizes of the trees indicate that the accentuation in the Slovenian language is a very complex problem and a simple solution in the form of relatively simple rules is not possible.