Distribution-based pruning of backoff language models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Minimum cut model for spoken lecture segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Style & topic language model adaptation using HMM-LDA
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Analysis and processing of lecture audio data: preliminary investigations
SpeechIR '04 Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004
MAP adaptation of stochastic grammars
Computer Speech and Language
Language models learning for domain-specific natural language user interaction
ROBIO'09 Proceedings of the 2009 international conference on Robotics and biomimetics
An empirical investigation of discounting in cross-domain language models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Hi-index | 0.01 |
In domains with insufficient matched training data, language models are often constructed by interpolating component models trained from partially matched corpora. Since the n-grams from such corpora may not be of equal relevance to the target domain, we propose an n-gram weighting technique to adjust the component n-gram probabilities based on features derived from readily available segmentation and metadata information for each corpus. Using a log-linear combination of such features, the resulting model achieves up to a 1.2% absolute word error rate reduction over a linearly interpolated baseline language model on a lecture transcription task.