Probabilistic models of short and long distance word dependencies in running text
HLT '89 Proceedings of the workshop on Speech and Natural Language
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Technology-driven design of speech recognition systems
Proceedings of the 1st conference on Designing interactive systems: processes, practices, methods, & techniques
A Review of Statistical Language Processing Techniques
Artificial Intelligence Review
Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Corrections to "A Cache-Based Language Model for Speech Recognition"
IEEE Transactions on Pattern Analysis and Machine Intelligence
Improving language models by clustering training sentences
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Distribution of content words and phrases in text and language modelling
Natural Language Engineering
A model of lexical attraction and repulsion
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Modeling topic coherence for speech recognition
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Improvement of a Whole Sentence Maximum Entropy Language Model using grammatical features
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Adaptive language modeling using minimum discriminant estimation
HLT '91 Proceedings of the workshop on Speech and Natural Language
Improvements in stochastic language modeling
HLT '91 Proceedings of the workshop on Speech and Natural Language
Adaptive language modeling using the maximum entropy principle
HLT '93 Proceedings of the workshop on Human Language Technology
Language modeling with sentence-level mixtures
HLT '94 Proceedings of the workshop on Human Language Technology
Word Topic Models for Spoken Document Retrieval and Transcription
ACM Transactions on Asian Language Information Processing (TALIP)
Arabic language modeling with finite state transducers
HLT-SRWS '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop
Shrinking exponential language models
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Adaptation of large vocabulary recognition system parameters
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Adaptive language modeling using minimum discriminant estimation
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Trigger-based language models: a maximum entropy approach
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Sentiment analysis of customer reviews: balanced versus unbalanced datasets
KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part I
A three level cache-based adaptive chinese language model
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
On the dynamic adaptation of language models based on dialogue information
Expert Systems with Applications: An International Journal
Leveraging relevance cues for language modeling in speech recognition
Information Processing and Management: an International Journal
Hi-index | 0.01 |
In the case of a trigram language model, the probability of the next word conditioned on the previous two words is estimated from a large corpus of text. The resulting static trigram language model (STLM) has fixed probabilities that are independent of the document being dictated. To improve the language model (LM), one can adapt the probabilities of the trigram language model to match the current document more closely. The partially dictated document provides significant clues about what words are more likely to be used next. Of many methods that can be used to adapt the LM, we describe in this paper a simple model based on the trigram frequencies estimated from the partially dictated document. We call this model a cache trigram language model (CTLM) since we are caching the recent history of words. We have found that the CTLM reduces the perplexity of a dictated document by 23%. The error rate of a 20,000-word isolated word recognizer decreases by about 5% at the beginning of a document and by about 24% after a few hundred words.