Morpheme Based Language Models for Speech Recognition of Czech

  • Authors:
  • William J. Byrne;Jan Hajic;Pavel Krbec;Pavel Ircing;Josef Psutka

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In our paper we propose new technique for language modelling of highly inflectional languages such as Czech, Russian an other Slavic languages. Our aim is to alleviate main problem encountered in these languages, which is enormous vocabulary growth caused by great number of different word forms derived from one word (lemma). We reduced the size of the vocabulary by decomposing words into stems and endings and storing these sub-word units (morphemes) in the vocabulary separately. Then we trained morpheme based language model on the decomposed corpus. This paper reports perplexities, OOV rates and some speech recognition results obtained with new language model.