Speech recognition of Czech: inclusion of rare words helps

Authors:
Petr Podveský;Pavel Machek
Affiliations:
Charles University, Prague, Czech Republic;Charles University, Prague, Czech Republic
Venue:
ACLstudent '05 Proceedings of the ACL Student Research Workshop
Year:
2005

Citing 1
Cited 1

Tagging inflective languages: prediction of morphological categories for a rich, structured tagset

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1

Corrective models for speech recognition of inflected languages

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large vocabulary continuous speech recognition of inflective languages, such as Czech, Russian or Serbo-Croatian, is heavily deteriorated by excessive out of vocabulary rate. In this paper, we tackle the problem of vocabulary selection, language modeling and pruning for inflective languages. We show that by explicit reduction of out of vocabulary rate we can achieve significant improvements in recognition accuracy while almost preserving the model size. Reported results are on Czech speech corpora.