Augmenting a small parallel text with morpho-syntactic language resources for Serbian-English statistical machine translation

  • Authors:
  • Maja Popović;David Vilar;Hermann Ney;Slobodan Jovičić;Zoran Šarić

  • Affiliations:
  • RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;University of Belgrade, Serbia and Montenegro;University of Belgrade, Serbia and Montenegro

  • Venue:
  • ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this work, we examine the quality of several statistical machine translation systems constructed on a small amount of parallel Serbian-English text. The main bilingual parallel corpus consists of about 3k sentences and 20k running words from an unrestricted domain. The translation systems are built on the full corpus as well as on a reduced corpus containing only 200 parallel sentences. A small set of about 350 short phrases from the web is used as additional bilingual knowledge. In addition, we investigate the use of monolingual morpho-syntactic knowledge i.e. base forms and POS tags.