Fluency, adequacy, or HTER?: exploring different human judgments with a tunable MT metric

Authors:
Matthew Snover;Nitin Madnani;Bonnie J. Dorr;Richard Schwartz
Affiliations:
University of Maryland, College Park;University of Maryland, College Park;University of Maryland, College Park amd Human Language Technology Center of Excellence;Human Language Technology Center of Excellence and BBN Technologies
Venue:
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Year:
2009

Citing 7
Cited 39

Block edit models for approximate string matching

Theoretical Computer Science - Special issue: Latin American theoretical informatics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Paraphrasing with bilingual parallel corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Paraphrasing for automatic evaluation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Re-evaluating machine translation results with paraphrase support

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Syntactic constraints on paraphrases extracted from parallel corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using paraphrases for parameter tuning in statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation

Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Incremental hypothesis alignment with flexible matching for building confusion networks: BBN system description for WMT09 system combination task

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Robust machine translation evaluation with entailment features

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Exploiting comparable corpora with TER and TERp

BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Expected dependency pair match: predicting translation quality with expected syntactic structure

Machine Translation
TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate

Machine Translation
Phrasal: a toolkit for statistical machine translation with facilities for extraction and incorporation of arbitrary model features

HLT-DEMO '10 Proceedings of the NAACL HLT 2010 Demonstration Session
Extending the meteor machine translation evaluation metric to the phrase level

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The best lexical metric for phrase-based statistical MT system optimization

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
An augmented three-pass system combination framework: DCU combination system for WMT 2010

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
BBN system description for WMT10 system combination task

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
METEOR-NEXT and the METEOR paraphrase tables: improved evaluation support for five target languages

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
TESLA: translation evaluation of sentences with linear-programming-based analysis

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
The parameter-optimized ATEC metric for MT evaluation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Facilitating translation using source language paraphrase lattices

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Further meta-evaluation of broad-coverage surface realization

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
PEM: a paraphrase evaluation metric exploiting parallel texts

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Using bilingual parallel corpora for cross-lingual textual entailment

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Incorporating source-language paraphrases into phrase-based SMT with confusion networks

SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
Parallel sentence generation from comparable corpora for improved SMT

Machine Translation
AMBER: a modified BLEU, enhanced ranking metric

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
TESLA at WMT 2011: translation evaluation and tunable metric

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
e-rating machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Description of the JHU system combination scheme for WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Correcting semantic collocation errors with L1-induced paraphrases

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning to simplify sentences with quasi-synchronous grammar and integer programming

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
ATT-0: submission to generation challenges 2011 surface realization: shared task

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
HyTER: meaning-equivalent semantics for translation evaluation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
ETS: discriminative edit models for paraphrase scoring

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Exploring grammatical error correction with not-so-crummy machine translation

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
PORT: a precision-order-recall MT evaluation metric for tuning

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Probabilistic finite state machines for regression-based MT evaluation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Extending machine translation evaluation metrics with lexical cohesion to document level

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
SPEDE: probabilistic edit distance metrics for MT evaluation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Review of hypothesis alignment algorithms for MT system combination via confusion network decoding

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Machine learning for hybrid machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
A diagnostic evaluation approach for english to hindi MT using linguistic checkpoints and error rates

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Oracle decoding as a new way to analyze phrase-based machine translation

Machine Translation
Lattice BLEU oracles in machine translation

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic Machine Translation (MT) evaluation metrics have traditionally been evaluated by the correlation of the scores they assign to MT output with human judgments of translation performance. Different types of human judgments, such as Fluency, Adequacy, and HTER, measure varying aspects of MT performance that can be captured by automatic MT metrics. We explore these differences through the use of a new tunable MT metric: TER-Plus, which extends the Translation Edit Rate evaluation metric with tunable parameters and the incorporation of morphology, synonymy and paraphrases. TER-Plus was shown to be one of the top metrics in NIST's Metrics MATR 2008 Challenge, having the highest average rank in terms of Pearson and Spearman correlation. Optimizing TER-Plus to different types of human judgments yields significantly improved correlations and meaningful changes in the weight of different types of edits, demonstrating significant differences between the types of human judgments.