Investigating the contribution of linguistic information to quality estimation

Authors:
Mariano Felice;Lucia Specia
Affiliations:
Computer Laboratory, University of Cambridge, Cambridge, UK CB3 0FD;Department of Computer Science, University of Sheffield, Sheffield, UK S1 4DP
Venue:
Machine Translation
Year:
2013

Citing 16
Cited 0

BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Introduction to Machine Learning

Introduction to Machine Learning
Error detection for statistical machine translation using linguistic features

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Linguistic measures for automatic machine translation evaluation

Machine Translation
Goodness: a method for measuring machine translation confidence

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Automatic projection of semantic structures: an application to pairwise translation ranking

SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
Findings of the 2012 workshop on statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Quality estimation for machine translation output using linguistic analysis and decoding features

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Black box features for the WMT 2012 quality estimation shared task

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Linguistic features for quality estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Tree kernels for machine translation quality estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
The UPC submission to the WMT 2012 shared task on quality estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Morpheme- and POS-based IBM1 scores and language model scores for translation quality estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
DCU-symantec submission for the WMT 2012 quality estimation task

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
The SDL language weaver systems in the WMT12 quality estimation shared task

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a study on the contribution of linguistically-informed features to the task of quality estimation for machine translation at sentence level. A standard regression algorithm is used to build models using a combination of linguistic and non-linguistic features extracted from the input text and its machine translation. Experiments with three English---Spanish translation datasets show that linguistic features on their own are not able to outperform shallower features based on statistics from the input text, its translation and additional corpora. However, further analysis suggests that linguistic information can be useful to produce better results if carefully combined with other features. An in-depth analysis of the results highlights a number of issues related to the use of linguistic features.