Morpheme Based Language Models for Speech Recognition of Czech

Authors:
William J. Byrne;Jan Hajic;Pavel Krbec;Pavel Ircing;Josef Psutka
Affiliations:
-;-;-;-;-
Venue:
TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Year:
2000

Citing 1
Cited 5

Large Vocabulary Speech Recognition for Read and Broadcast Czech

TSD '99 Proceedings of the Second International Workshop on Text, Speech and Dialogue

Large Vocabulary Continuous Speech Recognizer for Slovenian Language

TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
Syllable Based Language Model for Large Vocabulary Continuous Speech Recognition of Polish

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Analysis of Czech web 1T 5-gram corpus and its comparison with Czech national corpus data

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Challenges in speech processing of slavic languages (case studies in speech recognition of czech and slovak)

COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony
A morphological analyzer using hash tables in main memory (MAHT) and a lexical knowledge base

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In our paper we propose new technique for language modelling of highly inflectional languages such as Czech, Russian an other Slavic languages. Our aim is to alleviate main problem encountered in these languages, which is enormous vocabulary growth caused by great number of different word forms derived from one word (lemma). We reduced the size of the vocabulary by decomposing words into stems and endings and storing these sub-word units (morphemes) in the vocabulary separately. Then we trained morpheme based language model on the decomposed corpus. This paper reports perplexities, OOV rates and some speech recognition results obtained with new language model.