Linguistic measures for automatic machine translation evaluation

Authors:
Jesús Giménez;Lluís Màrquez
Affiliations:
Universitat Politècnica de Catalunya, Barcelona, Spain;Universitat Politècnica de Catalunya, Barcelona, Spain
Venue:
Machine Translation
Year:
2010

Citing 31
Cited 4

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A machine learning approach to the automatic evaluation of machine translation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Precision and recall of machine translation

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics
Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Dependency treelet translation: syntactically informed phrasal SMT

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
ORANGE: a method for evaluating automatic evaluation metrics for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
BLANC: learning evaluation metrics for MT

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Paraphrasing for automatic evaluation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Stochastic iterative alignment for machine translation evaluation

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Linguistically motivated large-scale NLP with C&C and boxer

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Re-evaluating machine translation results with paraphrase support

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Dependency-based automatic evaluation for machine translation

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Word error rates: decomposition over Pos classes and applications for error analysis

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
The role of pseudo references in MT evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Textual entailment features for machine translation evaluation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Contextual bitext-derived paraphrases in automatic MT evaluation

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Manual and automatic evaluation of machine translation between European languages

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Semantic role labeling using complete syntactic analysis

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Expected dependency pair match: predicting translation quality with expected syntactic structure

Machine Translation
Measuring machine translation quality as semantic equivalence: A metric based on entailment features

Machine Translation
TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate

Machine Translation
METEOR-NEXT and the METEOR paraphrase tables: improved evaluation support for five target languages

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

Automatic projection of semantic structures: an application to pairwise translation ranking

SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
A graphical interface for MT evaluation and error analysis

ACL '12 Proceedings of the ACL 2012 System Demonstrations
Linguistic features for quality estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Investigating the contribution of linguistic information to quality estimation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Assessing the quality of candidate translations involves diverse linguistic facets. However, most automatic evaluation methods in use today rely on limited quality assumptions, such as lexical similarity. This introduces a bias in the development cycle which in some cases has been reported to carry very negative consequences. In order to tackle this methodological problem, we explore a novel path towards heterogeneous automatic Machine Translation evaluation. We have compiled a rich set of specialized similarity measures operating at different linguistic dimensions and analyzed their individual and collective behaviour over a wide range of evaluation scenarios. Results show that measures based on syntactic and semantic information are able to provide more reliable system rankings than lexical measures, especially when the systems under evaluation are based on different paradigms. At the sentence level, while some linguistic measures perform better than most lexical measures, some others perform substantially worse, mainly due to parsing problems. Their scores are, however, suitable for combination, yielding a substantially improved evaluation quality.