Automatic accentuation of words for Slovenian TTS system

  • Authors:
  • Tomaž Šef

  • Affiliations:
  • Department of Intelligent Systems, Jožef Stefan Institute, Ljubljana, Slovenia

  • Venue:
  • SIP'06 Proceedings of the 5th WSEAS international conference on Signal processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The accentuation of unknown Slovene words represents a challenging task for automated solvers since in Slovenian, stress can be located on arbitrary syllables. Most words have only one stressed syllable, but there exist also words with no stress and words with more than one stress. Furthermore, different forms of the same word can be stressed differently. In this paper, we present a two level lexical stress assignment model for out of vocabulary Slovenian words used in our text-to-speech system. First, each vowel is determined, whether it is stressed or unstressed, and a type of lexical stress is assigned for every stressed vowel. Then, some corrections are made on the word level, according the number of stressed vowels and the length of the word. We applied a machine-learning technique (decision trees or boosted decision trees). The accuracy achieved by decision trees significantly outperforms all previous results. However, the sizes of the trees indicate that the accentuation in the Slovenian language is a very complex problem and a simple solution in the form of relatively simple rules is not possible.