BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Using the web as an implicit training set: application to noun compound syntax and semantics
Using the web as an implicit training set: application to noun compound syntax and semantics
UCB system description for the WMT 2007 shared task
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Experiments in domain adaptation for statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Manual and automatic evaluation of machine translation between European languages
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Noun Compound Interpretation Using Paraphrasing Verbs: Feasibility Study
AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
Improved Statistical Machine Translation Using Monolingual Paraphrases
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Further meta-evaluation of machine translation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
An empirical study on development set selection strategy for machine translation learning
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Facilitating translation using source language paraphrase lattices
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Incorporating source-language paraphrases into phrase-based SMT with confusion networks
SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
Domain adaptation via pseudo in-domain data selection
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Cutting the long tail: hybrid language models for translation style adaptation
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
QCRI at WMT12: experiments in Spanish-English and German-English machine translation of news text
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Semantic interpretation of noun compounds using verbal and other paraphrases
ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2
Hi-index | 0.00 |
We describe the experiments of the UC Berkeley team on improving English-Spanish machine translation of news text, as part of the WMT'08 Shared Translation Task. We experiment with domain adaptation, combining a small in-domain news bi-text and a large out-of-domain one from the Europarl corpus, building two separate phrase translation models and two separate language models. We further add a third phrase translation model trained on a version of the news bi-text augmented with monolingual sentence-level syntactic paraphrases on the source-language side, and we combine all models in a log-linear model using minimum error rate training. Finally, we experiment with different tokenization and recasing rules, achieving 35.09% Bleu score on the WMT'07 news test data when translating from English to Spanish, which is a sizable improvement over the highest Bleu score achieved on that dataset at WMT'07: 33.10% (in fact, by our system). On the WMT'08 English to Spanish news translation, we achieve 21.92%, which makes our team the second best on Bleu score.