DCU-symantec submission for the WMT 2012 quality estimation task

Authors:
Raphael Rubino;Jennifer Foster;Joachim Wagner;Johann Roturier;Rasul Samad Zadeh Kaljahi;Fred Hollowood
Affiliations:
Dublin City University and Symantec, Ireland;Dublin City University;Dublin City University;Symantec, Ireland;Dublin City University and Symantec, Ireland;Symantec, Ireland
Venue:
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Year:
2012

Citing 17
Cited 3

Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Random Forests

Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A machine learning approach to the automatic evaluation of machine translation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
The Parallel Grammar project

COLING-GEE '02 Proceedings of the 2002 workshop on Grammar engineering and evaluation - Volume 15
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Bilingual LSA-based adaptation for statistical machine translation

Machine Translation
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Labelled dependencies in machine translation evaluation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Linguistic features for automatic evaluation of heterogenous MT systems

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Polylingual topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Multilingual topic models for unaligned text

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
A multi-view approach for term translation spotting

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Findings of the 2012 workshop on statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Quality estimation for machine translation: some lessons learned

Machine Translation
Investigating the contribution of linguistic information to quality estimation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the features and the machine learning methods used by Dublin City University (DCU) and SYMANTEC for the WMT 2012 quality estimation task. Two sets of features are proposed: one constrained, i.e. respecting the data limitation suggested by the workshop organisers, and one unconstrained, i.e. using data or tools trained on data that was not provided by the workshop organisers. In total, more than 300 features were extracted and used to train classifiers in order to predict the translation quality of unseen data. In this paper, we focus on a subset of our feature set that we consider to be relatively novel: features based on a topic model built using the Latent Dirichlet Allocation approach, and features based on source and target language syntax extracted using part-of-speech (POS) taggers and parsers. We evaluate nine feature combinations using four classification-based and four regression-based machine learning techniques.