Speech recognition of Czech: inclusion of rare words helps

  • Authors:
  • Petr Podveský;Pavel Machek

  • Affiliations:
  • Charles University, Prague, Czech Republic;Charles University, Prague, Czech Republic

  • Venue:
  • ACLstudent '05 Proceedings of the ACL Student Research Workshop
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large vocabulary continuous speech recognition of inflective languages, such as Czech, Russian or Serbo-Croatian, is heavily deteriorated by excessive out of vocabulary rate. In this paper, we tackle the problem of vocabulary selection, language modeling and pruning for inflective languages. We show that by explicit reduction of out of vocabulary rate we can achieve significant improvements in recognition accuracy while almost preserving the model size. Reported results are on Czech speech corpora.