SMT of Latvian, Lithuanian and Estonian Languages: a Comparative Study

  • Authors:
  • Maxim Khalilov;Lauma Pretkalniņa;Natalja Kuvaldina;Veronika Pereseina

  • Affiliations:
  • Institute for Logic, Language and Computation, University of Amsterdam, Amsterdam, The Netherlands;Institute of Mathematics and Computer Science, University of Latvia, Riga, Latvia;Marine Systems Institute, Tallinn University of Technology, Tallinn, Estonia;Jönköping International Business School, Jönköping University, Jönköping, Sweden

  • Venue:
  • Proceedings of the 2010 conference on Human Language Technologies -- The Baltic Perspective: Proceedings of the Fourth International Conference Baltic HLT 2010
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper is an attempt to discover the main challenges in working with Baltic and Estonian languages, and to identify the most significant sources of errors generated by a SMT system trained on large-vocabulary parallel corpora from legislative domain. An immense distinction between Latvian/Lithuanian and Estonian languages causes a set of non-equivalent difficulties which we classify and compare. In the analysis step, we move beyond automatic scores and contribute presenting a human error analysis of MT systems output that helps to determine the most prominent source of errors typical for SMT systems under consideration.