Textual entailment features for machine translation evaluation

Authors:
Sebastian Padó;Michel Galley;Dan Jurafsky;Christopher D. Manning
Affiliations:
Stanford University;Stanford University;Stanford University;Stanford University
Venue:
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Year:
2009

Citing 12
Cited 10

BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Direct word sense matching for lexical substitution

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Methods for using textual entailment in open-domain question answering

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning to recognize features of valid textual entailments

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Modeling semantic containment and exclusion in natural language inference

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Inferring strategies for sentence ordering in multidocument news summarization

Journal of Artificial Intelligence Research
Learning to order things

Journal of Artificial Intelligence Research
Dependency-based automatic evaluation for machine translation

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
A smorgasbord of features for automatic MT evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation

Robust machine translation evaluation with entailment features

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Source-language entailment modeling for translating unknown terms

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Measuring machine translation quality as semantic equivalence: A metric based on entailment features

Machine Translation
Machine translation evaluation versus quality estimation

Machine Translation
Metrics for MT evaluation: evaluating reordering

Machine Translation
Towards cross-lingual textual entailment

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Evaluating N-gram based evaluation metrics for automatic keyphrase extraction

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Linguistic measures for automatic machine translation evaluation

Machine Translation
Match without a referee: evaluating MT adequacy without reference translations

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
An investigation into the application of ensemble learning for entailment classification

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present two regression models for the prediction of pairwise preference judgments among MT hypotheses. Both models are based on feature sets that are motivated by textual entailment and incorporate lexical similarity as well as local syntactic features and specific semantic phenomena. One model predicts absolute scores; the other one direct pairwise judgments. We find that both models are competitive with regression models built over the scores of established MT evaluation metrics. Further data analysis clarifies the complementary behavior of the two feature sets.