Combining linguistic data views for phrase-based SMT

  • Authors:
  • Jesús Giménez;Lluís Màrquez

  • Affiliations:
  • Universitat Politècnica de Catalunya, Barcelona;Universitat Politècnica de Catalunya, Barcelona

  • Venue:
  • ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
  • Year:
  • 2005

Quantified Score

Hi-index 0.02

Visualization

Abstract

We describe the Spanish-to-English LDV-COMBO system for the Shared Task 2: "Exploiting Parallel Texts for Statistical Machine Translation" of the ACL-2005 Workshop on "Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond". Our approach explores the possibility of working with alignments at different levels of abstraction, using different degrees of linguistic annotation. Several phrase-based translation models are built out from these alignments. Their combination significatively outperforms any of them in isolation. Moreover, we have built a word-based translation model based on WordNet which is used for unknown words.