Class-based n-gram models of natural language
Computational Linguistics
Algorithms for bigram and trigram word clustering
Speech Communication
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
A hybrid language model based on a combination of N-grams and stochastic context-free grammars
ACM Transactions on Asian Language Information Processing (TALIP)
A variable-length category-based n-gram language model
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Hi-index | 0.00 |
In this work several sets of categories obtained by a statistical clustering algorithm, as well as a linguistic set, were used to design category-based language models. The language models proposed were evaluated, as usual, in terms of perplexity of the text corpus. Then they were integrated into an ASR system and also evaluated in terms of system performance. It can be seen that category-based language models can perform better, also in terms of WER, when categories are obtained through statistical models instead of using linguistic techniques. They also show that better system performance are obtained when the language model interpolates category based and word based models.