Statistical and linguistic clustering for language modeling in ASR

Authors:
R. Justo;I. Torres
Affiliations:
Departamento de Electricidad y Electrónica, Facultad de Ciencia y Tecnología, Universidad del País Vasco;Departamento de Electricidad y Electrónica, Facultad de Ciencia y Tecnología, Universidad del País Vasco
Venue:
CIARP'05 Proceedings of the 10th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis and Applications
Year:
2005

Citing 5
Cited 0

Class-based n-gram models of natural language

Computational Linguistics
Algorithms for bigram and trigram word clustering

Speech Communication
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
A hybrid language model based on a combination of N-grams and stochastic context-free grammars

ACM Transactions on Asian Language Information Processing (TALIP)
A variable-length category-based n-gram language model

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work several sets of categories obtained by a statistical clustering algorithm, as well as a linguistic set, were used to design category-based language models. The language models proposed were evaluated, as usual, in terms of perplexity of the text corpus. Then they were integrated into an ASR system and also evaluated in terms of system performance. It can be seen that category-based language models can perform better, also in terms of WER, when categories are obtained through statistical models instead of using linguistic techniques. They also show that better system performance are obtained when the language model interpolates category based and word based models.