Improving statistical word alignments with morpho-syntactic transformations

  • Authors:
  • Adrià de Gispert;Deepa Gupta;Maja Popović;Patrik Lambert;Jose B. Mariño;Marcello Federico;Hermann Ney;Rafael Banchs

  • Affiliations:
  • TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain;ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento, Italy;Lehrstuhl für Informatik 6, RWTH Aachen University, Aachen, Germany;TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain;TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain;ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento, Italy;Lehrstuhl für Informatik 6, RWTH Aachen University, Aachen, Germany;TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain

  • Venue:
  • FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations according to information of POS-tagging, lemmatization or stemming, we explore which linguistic information helps improve alignment error rates. For this, evaluation against a human word alignment reference is performed, aiming at an improved machine translation training scheme which eventually leads to improved SMT performance. Experiments are carried out in a Spanish–English European Parliament Proceedings parallel corpus, both in a large and a small data track. As expected, improvements due to introducing morphosyntactic information are bigger in case of data scarcity, but significant improvement is also achieved in a large data task, meaning that certain linguistic knowledge is relevant even in situations of large data availability.