Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
A machine learning approach to the automatic evaluation of machine translation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Precision and recall of machine translation
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
The Proposition Bank: An Annotated Corpus of Semantic Roles
Computational Linguistics
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Dependency treelet translation: syntactically informed phrasal SMT
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
ORANGE: a method for evaluating automatic evaluation metrics for machine translation
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
BLANC: learning evaluation metrics for MT
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Paraphrasing for automatic evaluation
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Stochastic iterative alignment for machine translation evaluation
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Linguistically motivated large-scale NLP with C&C and boxer
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Re-evaluating machine translation results with paraphrase support
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Dependency-based automatic evaluation for machine translation
SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Word error rates: decomposition over Pos classes and applications for error analysis
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
(Meta-) evaluation of machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Further meta-evaluation of machine translation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
The role of pseudo references in MT evaluation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Textual entailment features for machine translation evaluation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Contextual bitext-derived paraphrases in automatic MT evaluation
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Manual and automatic evaluation of machine translation between European languages
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Semantic role labeling using complete syntactic analysis
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
METEOR-NEXT and the METEOR paraphrase tables: improved evaluation support for five target languages
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Automatic projection of semantic structures: an application to pairwise translation ranking
SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
A graphical interface for MT evaluation and error analysis
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Linguistic features for quality estimation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Investigating the contribution of linguistic information to quality estimation
Machine Translation
Hi-index | 0.00 |
Assessing the quality of candidate translations involves diverse linguistic facets. However, most automatic evaluation methods in use today rely on limited quality assumptions, such as lexical similarity. This introduces a bias in the development cycle which in some cases has been reported to carry very negative consequences. In order to tackle this methodological problem, we explore a novel path towards heterogeneous automatic Machine Translation evaluation. We have compiled a rich set of specialized similarity measures operating at different linguistic dimensions and analyzed their individual and collective behaviour over a wide range of evaluation scenarios. Results show that measures based on syntactic and semantic information are able to provide more reliable system rankings than lexical measures, especially when the systems under evaluation are based on different paradigms. At the sentence level, while some linguistic measures perform better than most lexical measures, some others perform substantially worse, mainly due to parsing problems. Their scores are, however, suitable for combination, yielding a substantially improved evaluation quality.