Quality estimation for machine translation output using linguistic analysis and decoding features

Authors:
Eleftherios Avramidis
Affiliations:
German Research Center for Artificial Intelligence (DFKI), Berlin, Germany
Venue:
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Year:
2012

Citing 10
Cited 3

Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Orange: from experimental machine learning to interactive data mining

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Learning accurate, compact, and interpretable tree annotation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Confidence estimation for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Word-Level Confidence Estimation for Machine Translation

Computational Linguistics
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Evaluate with confidence estimation: machine ranking of translation outputs using grammatical features

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
e-rating machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Findings of the 2012 workshop on statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Findings of the 2012 workshop on statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Dimensionality reduction methods for machine translation quality estimation

Machine Translation
Investigating the contribution of linguistic information to quality estimation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a submission to the WMT12 Quality Estimation task, including an extensive Machine Learning experimentation. Data were augmented with features from linguistic analysis and statistical features from the SMT search graph. Several Feature Selection algorithms were employed. The Quality Estimation problem was addressed both as a regression task and as a discretised classification task, but the latter did not generalise well on the unseen testset. The most successful regression methods had an RMSE of 0.86 and were trained with a feature set given by Correlation-based Feature Selection. Indications that RMSE is not always sufficient for measuring performance were observed.