Making large-scale support vector machine learning practical
Advances in kernel methods
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Confidence estimation for translation prediction
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Improving query translation with confidence estimation for cross language information retrieval
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Confidence estimation for machine translation
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
ORANGE: a method for evaluating automatic evaluation metrics for machine translation
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Translating with non-contiguous phrases
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Further meta-evaluation of machine translation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
A smorgasbord of features for automatic MT evaluation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Textual entailment features for machine translation evaluation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
PORTAGE: with smoothed phrase tables and segment choice models
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Towards predicting post-editing productivity
Machine Translation
Findings of the 2011 Workshop on Statistical Machine Translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Non-linear models for confidence estimation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Match without a referee: evaluating MT adequacy without reference translations
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Agent metaphor for machine translation mediated communication
Proceedings of the 2013 international conference on Intelligent user interfaces
Oracle decoding as a new way to analyze phrase-based machine translation
Machine Translation
Identifying useful human correction feedback from an on-line machine translation service
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Quality estimation for machine translation: some lessons learned
Machine Translation
Sentence-level ranking with quality estimation
Machine Translation
Hi-index | 0.00 |
Most evaluation metrics for machine translation (MT) require reference translations for each sentence in order to produce a score reflecting certain aspects of its quality. The de facto metrics, BLEU and NIST, are known to have good correlation with human evaluation at the corpus level, but this is not the case at the segment level. As an attempt to overcome these two limitations, we address the problem of evaluating the quality of MT as a prediction task, where reference-independent features are extracted from the input sentences and their translation, and a quality score is obtained based on models produced from training data. We show that this approach yields better correlation with human evaluation as compared to commonly used metrics, even with models trained on different MT systems, language-pairs and text domains.