Further meta-evaluation of machine translation

Authors:
Chris Callison-Burch;Cameron Fordyce;Philipp Koehn;Christof Monz;Josh Schroeder
Affiliations:
Johns Hopkins University;University of Edinburgh;University of Edinburgh;University of London;University of Edinburgh
Venue:
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Year:
2008

Citing 36
Cited 72

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
A systematic comparison of various statistical alignment models

Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Lexicalization in crosslinguistic probabilistic parsing: the case of French

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
What to do when lexicalization fails: parsing German with suffix analysis and smoothing

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Multi-engine machine translation guided by explicit word matching

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Design of a multi-lingual, parallel-processing statistical parsing engine

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Word error rates: decomposition over Pos classes and applications for error analysis

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
The 'noisier channel': translation from morphologically complex languages

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Experiments in domain adaptation for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Rich source-side context for statistical machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
LIMSI's statistical translation systems for WMT'08

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
The MetaMorpho translation system

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
METEOR, M-BLEU and M-TER: evaluation metrics for high-correlation with human rankings of machine translation output

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
First steps towards a general purpose French/English statistical machine translation system

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
The University of Washington machine translation system for ACL WMT 2008

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
The TALP-UPC Ngram-based statistical machine translation system for ACL-WMT 2008

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
European language translation with weighted finite state transducers: the CUED MT system for the 2008 ACL workshop on SMT

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Effects of morphological analysis in translation between German and English

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Towards better machine translation quality for the German--English language pairs

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Phrase-based and deep syntactic English-to-Czech statistical machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Improving English-Spanish statistical machine translation: experiments in domain adaptation, sentence paraphrasing, tokenization, and recasing

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Improving word alignment with language model based confidence scores

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Kernel regression framework for machine translation: UCL system description for WMT 2008 shared translation task

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Using syntactic coupling features for discriminating phrase-based translations (WMT-08 shared translation task)

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Statistical transfer systems for French--English and German--English machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
TectoMT: highly modular MT system with tectogrammatics used as transfer layer

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
MaTrEx: the DCU MT system for WMT 2008

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Can we relearn an RBMT system?

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Using Moses to integrate multiple rule-based machine translation engines into a hybrid system

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Incremental hypothesis alignment for building confusion networks with application to machine translation system combination

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
The role of pseudo references in MT evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Ranking vs. regression in machine translation evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
A smorgasbord of features for automatic MT evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation

Meta-evaluation of Machine Translation Using Parallel Legal Texts

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
On the use of comparable corpora to improve SMT performance

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Predicting success in machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Syntax-oriented evaluation measures for machine translation output

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
A simple automatic MT evaluation metric

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Textual entailment features for machine translation evaluation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
NUS at WMT09: domain adaptation experiments for English-Spanish machine translation of news commentary text

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
NICT@WMT09: model adaptation and transliteration for Spanish-English SMT

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Statistical post editing and dictionary extraction: Systran/Edinburgh submissions for ACL-WMT2009

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Experiments in morphosyntactic processing for translating to and from German

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
English-Czech MT in 2008

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Joshua: an open source toolkit for parsing-based machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
The back-translation score: automatic MT evaluation at the sentence level without reference translations

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
An investigation into the validity of some metrics for automatically evaluating natural language generation systems

Computational Linguistics
Robust machine translation evaluation with entailment features

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Source-language entailment modeling for translating unknown terms

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Measuring machine translation quality as semantic equivalence: A metric based on entailment features

Machine Translation
The Meteor metric for automatic evaluation of machine translation

Machine Translation
MaxSim: performance and effects of translation fluency

Machine Translation
The NIST 2008 Metrics for machine translation challenge--overview, methodology, metrics, and results

Machine Translation
Metric and reference factors in minimum error rate training

Machine Translation
Machine translation evaluation versus quality estimation

Machine Translation
Metrics for MT evaluation: evaluating reordering

Machine Translation
Wordica: Emergence of linguistic representations for words by independent component analysis

Natural Language Engineering
Automatic evaluation of linguistic quality in multi-document summarization

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Evaluating machine translations using mNCD

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Tackling sparse data issue in machine translation evaluation

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Using Mechanical Turk to build machine translation evaluation sets

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
2010 failures in English-Czech phrase-based MT

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Reproducible results in parsing-based machine translation: the JHU shared task submission

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
CMU multi-engine machine translation for WMT 2010

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Document-level automatic MT evaluation based on discourse representations

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
METEOR-NEXT and the METEOR paraphrase tables: improved evaluation support for five target languages

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Normalized compression distance based measures for MetricsMATR 2010

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Minimum error rate training by sampling the translation lattice

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Local lexical adaptation in machine translation through triangulation: SMT helping SMT

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A kernel regression framework for SMT

Machine Translation
Hierarchical phrase-based translation with weighted finite-state transducers and shallow-n grammars

Computational Linguistics
Packing it all up in search for a language independent MT quality measure tool - part two

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Improvement of machine translation evaluation by simple linguistically motivated features

Journal of Computer Science and Technology - Special issue on natural language processing
Linguistic measures for automatic machine translation evaluation

Machine Translation
MEANT: an inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility via semantic frames

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A dependency based statistical translation model

SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
Towards predicting post-editing productivity

Machine Translation
Parallel sentence generation from comparable corpora for improved SMT

Machine Translation
Findings of the 2011 Workshop on Statistical Machine Translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Evaluate with confidence estimation: machine ranking of translation outputs using grammatical features

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
AMBER: a modified BLEU, enhanced ranking metric

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Approximating a deep-syntactic metric for MT evaluation and tuning

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Evaluation without references: IBM1 scores as evaluation metrics

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Morphemes and POS tags for n-gram based evaluation metrics

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Regression and ranking based optimisation for sentence level machine translation evaluation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Diversity-aware evaluation for paraphrase patterns

TIWTE '11 Proceedings of the TextInfer 2011 Workshop on Textual Entailment
Corroborating text evaluation results with heterogeneous measures

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Syntax augmented inversion transduction grammars for machine translation

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
SMT versus AI redux: how semantic fames evaluate MT more accurately

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
ONTS: "optima" news translation system

EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Cardinality pruning and language model heuristics for hierarchical phrase-based translation

Machine Translation
SemEval-2012 task 6: a pilot on semantic textual similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
PORT: a precision-order-recall MT evaluation metric for tuning

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Large scale decipherment for out-of-domain machine translation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Findings of the 2012 workshop on statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Improving AMBER, an MT evaluation metric

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Class error rates for evaluation of machine translation output

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Fully automatic semantic MT evaluation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Generating targeted paraphrases for improved translation

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Distortion Model Based on Word Sequence Labeling for Statistical Machine Translation

ACM Transactions on Asian Language Information Processing (TALIP)
Fusion of word and letter based metrics for automatic MT evaluation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Sentence-level ranking with quality estimation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper analyzes the translation quality of machine translation systems for 10 language pairs translating between Czech, English, French, German, Hungarian, and Spanish. We report the translation quality of over 30 diverse translation systems based on a large-scale manual evaluation involving hundreds of hours of effort. We use the human judgments of the systems to analyze automatic evaluation metrics for translation quality, and we report the strength of the correlation with human judgments at both the system-level and at the sentence-level. We validate our manual evaluation methodology by measuring intra- and inter-annotator agreement, and collecting timing information.