Improving machine translation of null subjects in Italian and Spanish

  • Authors:
  • Lorenza Russo;Sharid Loáiciga;Asheesh Gulati

  • Affiliations:
  • University of Geneva, Geneva, Switzerland;University of Geneva, Geneva, Switzerland;University of Geneva, Geneva, Switzerland

  • Venue:
  • EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Null subjects are non overtly expressed subject pronouns found in pro-drop languages such as Italian and Spanish. In this study we quantify and compare the occurrence of this phenomenon in these two languages. Next, we evaluate null subjects' translation into French, a "non pro-drop" language. We use the Europarl corpus to evaluate two MT systems on their performance regarding null subject translation: Its-2, a rule-based system developed at LATL, and a statistical system built using the Moses toolkit. Then we add a rule-based preprocessor and a statistical post-editor to the Its-2 translation pipeline. A second evaluation of the improved Its-2 system shows an average increase of 15.46% in correct pro-drop translations for Italian-French and 12.80% for Spanish-French.