Re-evaluating machine translation results with paraphrase support

Authors:
Liang Zhou;Chin-Yew Lin;Eduard Hovy
Affiliations:
University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA
Venue:
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Year:
2006

Citing 6
Cited 26

A systematic comparison of various statistical alignment models

Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Paraphrasing with bilingual parallel corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improved statistical machine translation using paraphrases

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics

Noun Compound Interpretation Using Paraphrasing Verbs: Feasibility Study

AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
Improved Statistical Machine Translation Using Monolingual Paraphrases

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Clustering and matching headlines for automatic paraphrase acquisition

ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
Syntactic constraints on paraphrases extracted from parallel corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Decomposability of translation metrics for improved evaluation and efficient algorithms

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
References extension for the automatic evaluation of MT by syntactic hybridization

SSST '09 Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Fluency, adequacy, or HTER?: exploring different human judgments with a tunable MT metric

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Robust machine translation evaluation with entailment features

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Measuring machine translation quality as semantic equivalence: A metric based on entailment features

Machine Translation
TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate

Machine Translation
TrustRank: inducing trust in automatic translations via ranking

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
PEM: a paraphrase evaluation metric exploiting parallel texts

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Paraphrase generation as monolingual translation: data and evaluation

INLG '10 Proceedings of the 6th International Natural Language Generation Conference
A survey of paraphrasing and textual entailment methods

Journal of Artificial Intelligence Research
Generating phrasal and sentential paraphrases: A survey of data-driven methods

Computational Linguistics
Linguistic measures for automatic machine translation evaluation

Machine Translation
Diversity-aware evaluation for paraphrase patterns

TIWTE '11 Proceedings of the TextInfer 2011 Workshop on Textual Entailment
A generate and rank approach to sentence paraphrasing

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Better evaluation metrics lead to better machine translation

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A new sentence compression dataset and its use in an abstractive generate-and-rank sentence compressor

UCNLG+EVAL '11 Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop
Feature analysis of Chinese textual entailment system

ROCLING '11 ROCLING 2011 Poster Papers
Probabilistic finite state machines for regression-based MT evaluation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
SPEDE: probabilistic edit distance metrics for MT evaluation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Semantic interpretation of noun compounds using verbal and other paraphrases

ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present ParaEval, an automatic evaluation framework that uses paraphrases to improve the quality of machine translation evaluations. Previous work has focused on fixed n-gram evaluation metrics coupled with lexical identity matching. ParaEval addresses three important issues: support for paraphrase/synonym matching, recall measurement, and correlation with human judgments. We show that ParaEval correlates significantly better than BLEU with human assessment in measurements for both fluency and adequacy.