Syllable Based Language Model for Large Vocabulary Continuous Speech Recognition of Polish

  • Authors:
  • Piotr Majewski

  • Affiliations:
  • Faculty of Mathematics and Computer Science, University of Łódź, Łódź, Poland 90-238

  • Venue:
  • TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most of state-of-the-art large vocabulary continuous speech recognition systems use word-based n-gram language models. Such models are not optimal solution for inflectional or agglutinative languages. The Polish language is highly inflectional one and requires a very large corpora to create a sufficient language model with the small out-of-vocabulary ratio. We propose a syllable-based language model, which is better suited to highly inflectional language like Polish. In case of lack of resources (i.e. small corpora) syllable-based model outperforms word-based models in terms of number of out-of-vocabulary units (syllables in our model). Such model is an approximation of the morpheme-based model for Polish. In our paper, we show results of evaluation of syllable based model and its usefulness in speech recognition tasks.