DCU-symantec submission for the WMT 2012 quality estimation task

  • Authors:
  • Raphael Rubino;Jennifer Foster;Joachim Wagner;Johann Roturier;Rasul Samad Zadeh Kaljahi;Fred Hollowood

  • Affiliations:
  • Dublin City University and Symantec, Ireland;Dublin City University;Dublin City University;Symantec, Ireland;Dublin City University and Symantec, Ireland;Symantec, Ireland

  • Venue:
  • WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the features and the machine learning methods used by Dublin City University (DCU) and SYMANTEC for the WMT 2012 quality estimation task. Two sets of features are proposed: one constrained, i.e. respecting the data limitation suggested by the workshop organisers, and one unconstrained, i.e. using data or tools trained on data that was not provided by the workshop organisers. In total, more than 300 features were extracted and used to train classifiers in order to predict the translation quality of unseen data. In this paper, we focus on a subset of our feature set that we consider to be relatively novel: features based on a topic model built using the Latent Dirichlet Allocation approach, and features based on source and target language syntax extracted using part-of-speech (POS) taggers and parsers. We evaluate nine feature combinations using four classification-based and four regression-based machine learning techniques.