How to avoid burning ducks: combining linguistic analysis and corpus statistics for German compound processing

  • Authors:
  • Fabienne Fritzinger;Alexander Fraser

  • Affiliations:
  • University of Stuttgart;University of Stuttgart

  • Venue:
  • WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Compound splitting is an important problem in many Nlp applications which must be solved in order to address issues of data sparsity. Previous work has shown that linguistic approaches for German compound splitting produce a correct splitting more often, but corpus-driven approaches work best for phrase-based statistical machine translation from German to English, a worrisome contradiction. We address this situation by combining linguistic analysis with corpus-driven statistics and obtaining better results in terms of both producing splittings according to a gold standard and statistical machine translation performance.