Syllable Based Language Model for Large Vocabulary Continuous Speech Recognition of Polish

Authors:
Piotr Majewski
Affiliations:
Faculty of Mathematics and Computer Science, University of Łódź, Łódź, Poland 90-238
Venue:
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Year:
2008

Citing 4
Cited 0

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Morpheme Based Language Models for Speech Recognition of Czech

TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Factored language models and generalized parallel backoff

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Large vocabulary continuous speech recognition of an inflected language using stems and endings

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of state-of-the-art large vocabulary continuous speech recognition systems use word-based n-gram language models. Such models are not optimal solution for inflectional or agglutinative languages. The Polish language is highly inflectional one and requires a very large corpora to create a sufficient language model with the small out-of-vocabulary ratio. We propose a syllable-based language model, which is better suited to highly inflectional language like Polish. In case of lack of resources (i.e. small corpora) syllable-based model outperforms word-based models in terms of number of out-of-vocabulary units (syllables in our model). Such model is an approximation of the morpheme-based model for Polish. In our paper, we show results of evaluation of syllable based model and its usefulness in speech recognition tasks.