Toward a unified approach to statistical language modeling for Chinese
ACM Transactions on Asian Language Information Processing (TALIP)
Learning classifiers from only positive and unlabeled data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Effective measures of domain similarity for parsing
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
An empirical investigation of discounting in cross-domain language models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
The RWTH Aachen machine translation system for WMT 2011
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Domain adaptation via pseudo in-domain data selection
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Does more data always yield better translations?
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Adapting translation models to translationese improves SMT
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Perplexity minimization for translation model domain adaptation in statistical machine translation
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Large, pruned or continuous space language models on a GPU for statistical machine translation
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Translation model based cross-lingual language model adaptation: from word models to phrase models
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Applying prediction techniques to phoneme-based AAC systems
SLPAT '12 Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies
The RWTH Aachen machine translation system for WMT 2012
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Joint WMT 2012 submission of the QUAERO project
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
LIUM's SMT machine translation systems for WMT 2012
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Selecting data for English-to-Czech machine translation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
DFKI's SMT system for WMT 2012
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Improving statistical machine translation by adapting translation models to translationese
Computational Linguistics
Improving statistical machine translation by adapting translation models to translationese
Computational Linguistics
Hi-index | 0.00 |
We address the problem of selecting non-domain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on comparing the cross-entropy, according to domain-specific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, than both random data selection and two other previously proposed methods.