Fast and Accurate Sentence Alignment of Bilingual Corpora
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
A systematic comparison of various statistical alignment models
Computational Linguistics
Kernel partial least squares regression in reproducing kernel hilbert space
The Journal of Machine Learning Research
An introduction to variable and feature selection
The Journal of Machine Learning Research
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
A tutorial on support vector regression
Statistics and Computing
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Confidence estimation for machine translation
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Regression for machine translation evaluation at the sentence level
Machine Translation
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Predicting success in machine translation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Statistical Machine Translation
Statistical Machine Translation
Machine translation evaluation versus quality estimation
Machine Translation
Bridging SMT and TM with translation recommendation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
All of Statistics: A Concise Course in Statistical Inference
All of Statistics: A Concise Course in Statistical Inference
Instance selection for machine translation using feature decay algorithms
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Findings of the 2012 workshop on statistical machine translation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
The SDL language weaver systems in the WMT12 quality estimation shared task
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Hi-index | 0.00 |
We develop a top performing model for automatic, accurate, and language independent prediction of sentence-level statistical machine translation (SMT) quality with or without looking at the translation outputs. We derive various feature functions measuring the closeness of a given test sentence to the training data and the difficulty of translating the sentence. We describe mono feature functions that are based on statistics of only one side of the parallel training corpora and duo feature functions that incorporate statistics involving both source and target sides of the training data. Overall, we describe novel, language independent, and SMT system extrinsic features for predicting the SMT performance, which also rank high during feature ranking evaluations. We experiment with different learning settings, with or without looking at the translations, which help differentiate the contribution of different feature sets. We apply partial least squares and feature subset selection, both of which improve the results and we present ranking of the top features selected for each learning setting, providing an exhaustive analysis of the extrinsic features used. We show that by just looking at the test source sentences and not using the translation outputs at all, we can achieve better performance than a baseline system using SMT model dependent features that generated the translations. Furthermore, our prediction system is able to achieve the $$2$$2 nd best performance overall according to the official results of the quality estimation task (QET) challenge when also looking at the translation outputs. Our representation and features achieve the top performance in QET among the models using the SVR learning model.