Class-based n-gram models of natural language
Computational Linguistics
Statistical methods for speech recognition
Statistical methods for speech recognition
An Introduction to Variational Methods for Graphical Models
Machine Learning
A neural probabilistic language model
The Journal of Machine Learning Research
Factored language models and generalized parallel backoff
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
A Comparison of Algorithms for Inference and Learning in Probabilistic Graphical Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Training connectionist models for the structured language model
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Speech and Language Processing (2nd Edition)
Speech and Language Processing (2nd Edition)
A hierarchical Bayesian language model based on Pitman-Yor processes
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Graphical models for visual object recognition and tracking
Graphical models for visual object recognition and tracking
Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
The Journal of Machine Learning Research
Hi-index | 0.00 |
Traditional n-gram language models are widely used in state-of-the-art large vocabulary speech recognition systems. This simple model suffers from some limitations, such as overfitting of maximum-likelihood estimation and the lack of rich contextual knowledge sources. In this paper, we exploit a hierarchical Bayesian interpretation for language modeling, based on a nonparametric prior called Pitman-Yor process. This offers a principled approach to language model smoothing, embedding the power-law distribution for natural language. Experiments on the recognition of conversational speech in multiparty meetings demonstrate that by using hierarchical Bayesian language models, we are able to achieve significant reductions in perplexity and word error rate.