LORIA system for the WMT12 quality estimation shared task

Authors:
Langlois David;Raybaud Sylvain;Smaïli Kamel
Affiliations:
Université de Lorraine, Villers les Nancy, France;Université de Lorraine, Villers les Nancy, France;Université de Lorraine, Villers les Nancy, France
Venue:
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Year:
2012

Citing 6
Cited 1

An introduction to variable and feature selection

The Journal of Machine Learning Research
Rule-based translation with statistical phrase-based post-editing

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
TrustRank: inducing trust in automatic translations via ranking

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Bridging SMT and TM with translation recommendation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
"This sentence is wrong." Detecting errors in machine-translated sentences

Machine Translation

Findings of the 2012 workshop on statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present the system we submitted to the WMT12 shared task on Quality Estimation. Each translated sentence is given a score between 1 and 5. The score is obtained using several numerical or boolean features calculated according to the source and target sentences. We perform a linear regression of the feature space against scores in the range [1: 5]. To this end, we use a Support Vector Machine. We experiment with two kernels: linear and radial basis function. In our submission we use the features from the shared task baseline system and our own features. This leads to 66 features. To deal with this large number of features, we propose an in-house feature selection algorithm. Our results show that a lot of information is already present in baseline features, and that our feature selection algorithm discards features which are linearly correlated.