(Meta-) evaluation of machine translation

Authors:
Chris Callison-Burch;Cameron Fordyce;Philipp Koehn;Christof Monz;Josh Schroeder
Affiliations:
Johns Hopkins University;CELCT;University of Edinburgh;University of London;University of Edinburgh
Venue:
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Year:
2007

Citing 32
Cited 66

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
A systematic comparison of various statistical alignment models

Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Precision and recall of machine translation

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Lexicalization in crosslinguistic probabilistic parsing: the case of French

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
What to do when lexicalization fails: parsing German with suffix analysis and smoothing

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Paraphrasing with bilingual parallel corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Morphology and reranking for the statistical parsing of Spanish

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
MT evaluation: human-like vs. human acceptable

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Design of a multi-lingual, parallel-processing statistical parsing engine

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Re-evaluating machine translation results with paraphrase support

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Word error rates: decomposition over Pos classes and applications for error analysis

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Analysis of statistical and morphological classes to generate weighted reordering hypotheses on a statistical machine translation system

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Domain adaptation in statistical machine translation with mixture modelling

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Getting to know Moses: initial experiments on German--English factored translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
NRC's PORTAGE system for WMT 2007

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Building a statistical machine translation system for French using the Europarl corpus

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Multi-engine machine translation with an open-source decoder for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
The ISL phrase-based MT system for the 2007 ACL workshop on statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Rule-based translation with statistical phrase-based post-editing

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
The 'noisier channel': translation from morphologically complex languages

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
UCB system description for the WMT 2007 shared task

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
The syntax augmented MT (SAMT) system for the shared task in the 2007 ACL Workshop on Statistical Machine Translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Statistical post-editing on SYSTRAN's rule-based translation system

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Experiments in domain adaptation for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
English-to-Czech factored machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Linguistic features for automatic evaluation of heterogenous MT systems

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Shared task: statistical machine translation between European languages

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts

Statistical machine translation

ACM Computing Surveys (CSUR)
Evaluating machine translation with LFG dependencies

Machine Translation
Enriching Statistical Translation Models Using a Domain-Independent Multilingual Lexical Knowledge Base

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Syntactic reordering integrated with phrase-based SMT

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Tighter integration of rule-based and statistical MT in serial system combination

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Syntactic constraints on paraphrases extracted from parallel corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Decomposability of translation metrics for improved evaluation and efficient algorithms

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Statistical post-editing of a rule-based machine translation system

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Syntactic reordering for English-Arabic phrase-based machine translation

Semitic '09 Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
Syntactic reordering integrated with phrase-based SMT

SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
Decoding with syntactic and non-syntactic phrases in a syntax-based machine translation system

SSST '09 Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
METEOR, M-BLEU and M-TER: evaluation metrics for high-correlation with human rankings of machine translation output

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
The TALP-UPC Ngram-based statistical machine translation system for ACL-WMT 2008

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Improving word alignment with language model based confidence scores

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Statistical transfer systems for French--English and German--English machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
The role of pseudo references in MT evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Ranking vs. regression in machine translation evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
A smorgasbord of features for automatic MT evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Syntax-oriented evaluation measures for machine translation output

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
NUS at WMT09: domain adaptation experiments for English-Spanish machine translation of news commentary text

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Application-driven statistical paraphrase generation

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
The Meteor metric for automatic evaluation of machine translation

Machine Translation
MaxSim: performance and effects of translation fluency

Machine Translation
The NIST 2008 Metrics for machine translation challenge--overview, methodology, metrics, and results

Machine Translation
Transferring structural markup across translations using multilingual alignment and projection

Proceedings of the 10th annual joint conference on Digital libraries
Significance tests of automatic machine translation evaluation metrics

Machine Translation
Metrics for MT evaluation: evaluating reordering

Machine Translation
Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Reproducible results in parsing-based machine translation: the JHU shared task submission

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Integration of multiple bilingually-learned segmentation schemes into statistical machine translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Further meta-evaluation of broad-coverage surface realization

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Automatic evaluation of translation quality for distant language pairs

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Leveraging multiple MT engines for paraphrase generation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
The application of structured learning in natural language processing

Machine Translation
Linguistically annotated reordering: Evaluation and analysis

Computational Linguistics
Exploitation of Machine Learning Techniques in Modelling Phrase Movements for Machine Translation

The Journal of Machine Learning Research
Online learning via dynamic reranking for computer assisted translation

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Linguistic measures for automatic machine translation evaluation

Machine Translation
Blast: a tool for error analysis of machine translation output

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
MEANT: an inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility via semantic frames

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
AM-FM: a semantic framework for translation quality assessment

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Automatic evaluation of Chinese translation output: word-level or character-level?

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Passive-aggressive for on-line learning in statistical machine translation

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
An active learning scenario for interactive machine translation

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Towards automatic error analysis of machine translation output

Computational Linguistics
Findings of the 2011 Workshop on Statistical Machine Translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Regression and ranking based optimisation for sentence level machine translation evaluation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Diversity-aware evaluation for paraphrase patterns

TIWTE '11 Proceedings of the TextInfer 2011 Workshop on Textual Entailment
Hierarchical finite-state models for speech translation using categorization of phrases

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Power-law distributions for paraphrases extracted from bilingual corpora

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Active learning for interactive machine translation

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
SemEval-2012 task 6: a pilot on semantic textual similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
An empirical evaluation of stop word removal in statistical machine translation

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Tree-based hybrid machine translation

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Towards a predicate-argument evaluation for MT

SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Findings of the 2012 workshop on statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Match without a referee: evaluating MT adequacy without reference translations

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Fully automatic semantic MT evaluation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
A diagnostic evaluation approach for english to hindi MT using linguistic checkpoints and error rates

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Automatic and human evaluation on english-croatian legislative test set

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Sentence-level ranking with quality estimation

Machine Translation
Cost-sensitive active learning for computer-assisted translation

Pattern Recognition Letters
A conjoint analysis framework for evaluating user preferences in machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper evaluates the translation quality of machine translation systems for 8 language pairs: translating French, German, Spanish, and Czech to English and back. We carried out an extensive human evaluation which allowed us not only to rank the different MT systems, but also to perform higher-level analysis of the evaluation process. We measured timing and intra- and inter-annotator agreement for three types of subjective evaluation. We measured the correlation of automatic evaluation metrics with human judgments. This meta-evaluation reveals surprising facts about the most commonly used methodologies.