Regression for machine translation evaluation at the sentence level

Authors:
Joshua S. Albrecht;Rebecca Hwa
Affiliations:
Department of Computer Science, University of Pittsburgh, Pittsburgh, USA 15260;Department of Computer Science, University of Pittsburgh, Pittsburgh, USA 15260
Venue:
Machine Translation
Year:
2008

Citing 21
Cited 4

Support-Vector Networks

Machine Learning
Making large-scale support vector machine learning practical

Advances in kernel methods
Principles of Context-Based Machine Translation Evaluation

Machine Translation
A machine learning approach to the automatic evaluation of machine translation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Precision and recall of machine translation

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
ORANGE: a method for evaluating automatic evaluation metrics for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
BLANC: learning evaluation metrics for MT

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Paraphrasing for automatic evaluation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Stochastic iterative alignment for machine translation evaluation

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Dependency-based automatic evaluation for machine translation

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Rich source-side context for statistical machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Ranking vs. regression in machine translation evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
A smorgasbord of features for automatic MT evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Contextual bitext-derived paraphrases in automatic MT evaluation

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation

MaxSim: performance and effects of translation fluency

Machine Translation
Regression and ranking based optimisation for sentence level machine translation evaluation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Fusion of word and letter based metrics for automatic MT evaluation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Predicting sentence translation quality using extrinsic and language independent features

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine learning offers a systematic framework for developing metrics that use multiple criteria to assess the quality of machine translation (MT). However, learning introduces additional complexities that may impact on the resulting metric's effectiveness. First, a learned metric is more reliable for translations that are similar to its training examples; this calls into question whether it is as effective in evaluating translations from systems that are not its contemporaries. Second, metrics trained from different sets of training examples may exhibit variations in their evaluations. Third, expensive developmental resources (such as translations that have been evaluated by humans) may be needed as training examples. This paper investigates these concerns in the context of using regression to develop metrics for evaluating machine-translated sentences. We track a learned metric's reliability across a 5 year period to measure the extent to which the learned metric can evaluate sentences produced by other systems. We compare metrics trained under different conditions to measure their variations. Finally, we present an alternative formulation of metric training in which the features are based on comparisons against pseudo-references in order to reduce the demand on human produced resources. Our results confirm that regression is a useful approach for developing new metrics for MT evaluation at the sentence level.