Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Morpheme Based Language Models for Speech Recognition of Czech
TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Factored language models and generalized parallel backoff
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Hi-index | 0.00 |
Most of state-of-the-art large vocabulary continuous speech recognition systems use word-based n-gram language models. Such models are not optimal solution for inflectional or agglutinative languages. The Polish language is highly inflectional one and requires a very large corpora to create a sufficient language model with the small out-of-vocabulary ratio. We propose a syllable-based language model, which is better suited to highly inflectional language like Polish. In case of lack of resources (i.e. small corpora) syllable-based model outperforms word-based models in terms of number of out-of-vocabulary units (syllables in our model). Such model is an approximation of the morpheme-based model for Polish. In our paper, we show results of evaluation of syllable based model and its usefulness in speech recognition tasks.