Using part of speech n-grams for improving automatic speech recognition of polish

  • Authors:
  • Aleksander Pohl;Bartosz Ziółko

  • Affiliations:
  • Department of Electronics, AGH University of Science and Technology, Kraków, Poland,Department of Computational Linguistics, Jagiellonian University, Kraków, Poland;Department of Electronics, AGH University of Science and Technology, Kraków, Poland

  • Venue:
  • MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper investigates the usefulness of a part of speech language model on the task of automatic speech recognition. The develped model uses part of speech tags as categories in a category-based language model. The constructed model is used to re-score the hypotheses generated by the HTK acoustic module. The probability of a given sequence of words is estimated using n-grams with Witten-Bell backoff. The experiments presented in this paper were carried out for Polish. The best obtained results show that the part-of-speech-only language model trained on a 1-million manually tagged corpus reduces the word error rate by more than 10 percentage points.