Task Adaptation Using MAP Estimation in N-Gram Language Modeling
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Word re-ordering and DP-based search in statistical machine translation
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
A comparison of alignment models for statistical machine translation
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Language modeling with sentence-level mixtures
HLT '94 Proceedings of the workshop on Human Language Technology
Exploiting N-best hypotheses for SMT self-enhancement
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Language model adaptation using machine-translated text for resource-deficient languages
EURASIP Journal on Audio, Speech, and Music Processing
ACM Transactions on Asian Language Information Processing (TALIP)
Automatic speech recognition for under-resourced languages: A survey
Speech Communication
Hi-index | 0.00 |
Statistical language modeling requires a large corpus for the application domain. When a large corpus is not available, the language model adaptation technique has often been used in the speech recognition research domain. This adaptation needs only a small corpus of the application domain (the "target corpus") and the corpus should be written in the language of the model. However, it is sometimes difficult to collect even a small corpus, especially of spoken language, due to its high cost. To address this problem, this paper proposes a novel scheme that generates a small target corpus in the language of the model by machine translation of the target corpus in another language. As information about adjacent words, which is necessary for a statistical language model, is stored in the translation knowledge, it can be extracted by machine translation and used for adaptation. Experiments showed that the language model improvement was about half of that which was obtained with a human collected corpus, and this provided some initial proof of the concept experiments.