Generating targeted paraphrases for improved translation

Authors:
Nitin Madnani;Bonnie J. Dorr
Affiliations:
Educational Testing Service, Princeton, NJ;University of Maryland, College Park, MD
Venue:
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Year:
2013

Citing 26
Cited 1

A statistical approach to machine translation

Computational Linguistics
Integration of diverse recognition methodologies through reevaluation of N-best sentence hypotheses

HLT '91 Proceedings of the workshop on Speech and Natural Language
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Forest-based statistical sentence generation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
The surprise language exercises

ACM Transactions on Asian Language Information Processing (TALIP)
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A phrase-based, joint probability model for statistical machine translation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Paraphrasing with bilingual parallel corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
The Hiero machine translation system: extensions, evaluation, and analysis

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Improved statistical machine translation using paraphrases

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Paraphrasing for automatic evaluation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Hierarchical Phrase-Based Translation

Computational Linguistics
Statistical machine translation

ACM Computing Surveys (CSUR)
Syntactic constraints on paraphrases extracted from parallel corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using paraphrases for parameter tuning in statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Proceedings of the Third Workshop on Statistical Machine Translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Feasibility of human-in-the-loop minimum error rate training

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Statistical Machine Translation

Statistical Machine Translation
TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate

Machine Translation
Turker-assisted paraphrasing for English-Arabic machine translation

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
String-to-dependency statistical machine translation

Computational Linguistics
The circle of meaning: from translation to paraphrasing and back

The circle of meaning: from translation to paraphrasing and back

Sentiment profiles of multiword expressions in test-taker essays: The case of noun-noun compounds

ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today's Statistical Machine Translation (SMT) systems require high-quality human translations for parameter tuning, in addition to large bitexts for learning the translation units. This parameter tuning usually involves generating translations at different points in the parameter space and obtaining feedback against human-authored reference translations as to how good the translations. This feedback then dictates what point in the parameter space should be explored next. To measure this feedback, it is generally considered wise to have multiple (usually 4) reference translations to avoid unfair penalization of translation hypotheses which could easily happen given the large number of ways in which a sentence can be translated from one language to another. However, this reliance on multiple reference translations creates a problem since they are labor intensive and expensive to obtain. Therefore, most current MT datasets only contain a single reference. This leads to the problem of reference sparsity. In our previously published research, we had proposed the first paraphrase-based solution to this problem and evaluated its effect on Chinese-English translation. In this article, we first present extended results for that solution on additional source languages. More importantly, we present a novel way to generate “targeted” paraphrases that yields substantially larger gains (up to 2.7 BLEU points) in translation quality when compared to our previous solution (up to 1.6 BLEU points). In addition, we further validate these improvements by supplementing with human preference judgments obtained via Amazon Mechanical Turk.