Use of contexts in language model interpolation and adaptation

Authors:
X. Liu;M. J. F. Gales;P. C. Woodland
Affiliations:
Cambridge University Engineering Department, Trumpington Street, Cambridge CB2 1PZ, England, United Kingdom;Cambridge University Engineering Department, Trumpington Street, Cambridge CB2 1PZ, England, United Kingdom;Cambridge University Engineering Department, Trumpington Street, Cambridge CB2 1PZ, England, United Kingdom
Venue:
Computer Speech and Language
Year:
2013

Citing 15
Cited 2

Hidden Markov models, maximum mutual information estimation, and the speech recognition problem

Hidden Markov models, maximum mutual information estimation, and the speech recognition problem
Statistical methods for speech recognition

Statistical methods for speech recognition
Training products of experts by minimizing contrastive divergence

Neural Computation
Language Model Adaptation Using Mixtures and an Exponentially Decaying Cache

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Semantic Clustering for Adaptive Language Modeling

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition

Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition
Finite-state transducers in language and speech processing

Computational Linguistics
Test Data Likelihood for PLSA Models

Information Retrieval
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Continuous space language models

Computer Speech and Language
Generalization of specialized on-the-fly composition

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
On the dynamic adaptation of stochastic language models

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
A specialized on-the-fly algorithm for lexicon and language model composition

IEEE Transactions on Audio, Speech, and Language Processing
An inequality for rational functions with applications to some statistical estimation problems

IEEE Transactions on Information Theory

Language model cross adaptation for LVCSR system combination

Computer Speech and Language
Unsupervised language model adaptation for handwritten Chinese text recognition

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Language models (LMs) are often constructed by building multiple individual component models that are combined using context independent interpolation weights. By tuning these weights, using either perplexity or discriminative approaches, it is possible to adapt LMs to a particular task. This paper investigates the use of context dependent weighting in both interpolation and test-time adaptation of language models. Depending on the previous word contexts, a discrete history weighting function is used to adjust the contribution from each component model. As this dramatically increases the number of parameters to estimate, robust weight estimation schemes are required. Several approaches are described in this paper. The first approach is based on MAP estimation where interpolation weights of lower order contexts are used as smoothing priors. The second approach uses training data to ensure robust estimation of LM interpolation weights. This can also serve as a smoothing prior for MAP adaptation. A normalized perplexity metric is proposed to handle the bias of the standard perplexity criterion to corpus size. A range of schemes to combine weight information obtained from training data and test data hypotheses are also proposed to improve robustness during context dependent LM adaptation. In addition, a minimum Bayes' risk (MBR) based discriminative training scheme is also proposed. An efficient weighted finite state transducer (WFST) decoding algorithm for context dependent interpolation is also presented. The proposed technique was evaluated using a state-of-the-art Mandarin Chinese broadcast speech transcription task. Character error rate (CER) reductions up to 7.3% relative were obtained as well as consistent perplexity improvements.