e-rating machine translation

Authors:
Kristen Parton;Joel Tetreault;Nitin Madnani;Martin Chodorow
Affiliations:
Columbia University, NY;Educational Testing Service, Princeton, NJ;Educational Testing Service, Princeton, NJ;Hunter College of CUNY, New York, NY
Venue:
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Year:
2011

Citing 16
Cited 5

An unsupervised method for detecting grammatical errors

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A machine learning approach to the automatic evaluation of machine translation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Detecting errors in English article usage by non-native speakers

Natural Language Engineering
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
On rank correlation and the distance between rankings

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Evaluating effects of machine translation accuracy on cross-lingual patent retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
The ups and downs of preposition error detection in ESL writing

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
ILR-based MT comprehension test with multi-level questions

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Ranking vs. regression in machine translation evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Fluency, adequacy, or HTER?: exploring different human judgments with a tunable MT metric

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Who, what, when, where, why?: comparing multiple approaches to the cross-lingual 5W task

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
The Meteor metric for automatic evaluation of machine translation

Machine Translation
The NIST 2008 Metrics for machine translation challenge--overview, methodology, metrics, and results

Machine Translation
Automated Grammatical Error Detection for Language Learners

Automated Grammatical Error Detection for Language Learners
Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

Findings of the 2011 Workshop on Statistical Machine Translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
PORT: a precision-order-recall MT evaluation metric for tuning

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
TerrorCat: a translation error categorization-based MT quality metric

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Quality estimation for machine translation output using linguistic analysis and decoding features

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Sentence-level ranking with quality estimation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe our submissions to the WMT11 shared MT evaluation task: MTeRater and MTeRater-Plus. Both are machine-learned metrics that use features from e-rater®, an automated essay scoring engine designed to assess writing proficiency. Despite using only features from e-rater and without comparing to translations, MTeRater achieves a sentence-level correlation with human rankings equivalent to BLEU. Since MTeRater only assesses fluency, we build a meta-metric, MTeRater-Plus, that incorporates adequacy by combining MTeRater with other MT evaluation metrics and heuristics. This meta-metric has a higher correlation with human rankings than either MTeRater or individual MT metrics alone. However, we also find that e-rater features may not have significant impact on correlation in every case.