Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
A Cache-Based Natural Language Model for Speech Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
The Journal of Machine Learning Research
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
HLT '91 Proceedings of the workshop on Speech and Natural Language
A phrase-based, joint probability model for statistical machine translation
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Language model adaptation for statistical machine translation with structured query models
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
BiTAM: bilingual topic AdMixture models for word alignment
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Bilingual LSA-based adaptation for statistical machine translation
Machine Translation
Language Modeling for Machine Translation
Language Modeling for Machine Translation
Sampling alignment structure under a Bayesian translation model
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
Training phrase translation models with leaving-one-out
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Context adaptation in statistical machine translation using models with exponentially decaying cache
DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
N-gram-based tense models for statistical machine translation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Extending machine translation evaluation metrics with lexical cohesion to document level
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Document-wide decoding for phrase-based statistical machine translation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
The trouble with SMT consistency
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Modeling lexical cohesion for document-level machine translation
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Statistical machine translation systems are usually trained on a large amount of bilingual sentence pairs and translate one sentence at a time, ignoring document-level information. In this paper, we propose a cache-based approach to document-level translation. Since caches mainly depend on relevant data to supervise subsequent decisions, it is critical to fill the caches with highly-relevant data of a reasonable size. In this paper, we present three kinds of caches to store relevant document-level information: 1) a dynamic cache, which stores bilingual phrase pairs from the best translation hypotheses of previous sentences in the test document; 2) a static cache, which stores relevant bilingual phrase pairs extracted from similar bilingual document pairs (i.e. source documents similar to the test document and their corresponding target documents) in the training parallel corpus; 3) a topic cache, which stores the target-side topic words related with the test document in the source-side. In particular, three new features are designed to explore various kinds of document-level information in above three kinds of caches. Evaluation shows the effectiveness of our cache-based approach to document-level translation with the performance improvement of 0.81 in BLUE score over Moses. Especially, detailed analysis and discussion are presented to give new insights to document-level translation.