Statistical and linguistic clustering for language modeling in ASR

  • Authors:
  • R. Justo;I. Torres

  • Affiliations:
  • Departamento de Electricidad y Electrónica, Facultad de Ciencia y Tecnología, Universidad del País Vasco;Departamento de Electricidad y Electrónica, Facultad de Ciencia y Tecnología, Universidad del País Vasco

  • Venue:
  • CIARP'05 Proceedings of the 10th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis and Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work several sets of categories obtained by a statistical clustering algorithm, as well as a linguistic set, were used to design category-based language models. The language models proposed were evaluated, as usual, in terms of perplexity of the text corpus. Then they were integrated into an ASR system and also evaluated in terms of system performance. It can be seen that category-based language models can perform better, also in terms of WER, when categories are obtained through statistical models instead of using linguistic techniques. They also show that better system performance are obtained when the language model interpolates category based and word based models.