Linguistic features for quality estimation

Authors:
Mariano Felice;Lucia Specia
Affiliations:
Linguistics University of Wolverhampton, Wolverhampton, UK;University of Sheffield Regent Court, Sheffield, UK
Venue:
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Year:
2012

Citing 8
Cited 5

BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Introduction to Machine Learning

Introduction to Machine Learning
Error detection for statistical machine translation using linguistic features

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Linguistic measures for automatic machine translation evaluation

Machine Translation
Goodness: a method for measuring machine translation confidence

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Automatic projection of semantic structures: an application to pairwise translation ranking

SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation

Findings of the 2012 workshop on statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Statistical machine translation enhancements through linguistic levels: A survey

ACM Computing Surveys (CSUR)
Quality estimation for machine translation: some lessons learned

Machine Translation
Sentence-level ranking with quality estimation

Machine Translation
Investigating the contribution of linguistic information to quality estimation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a study on the contribution of linguistically-informed features to the task of quality estimation for machine translation at sentence level. A standard regression algorithm is used to build models using a combination of linguistic and non-linguistic features extracted from the input text and its machine translation. Experiments with English-Spanish translations show that linguistic features, although informative on their own, are not yet able to outperform shallower features based on statistics from the input text, its translation and additional corpora. However, further analysis suggests that linguistic information is actually useful but needs to be carefully combined with other features in order to produce better results.