Augmenting a small parallel text with morpho-syntactic language resources for Serbian-English statistical machine translation

Authors:
Maja Popović;David Vilar;Hermann Ney;Slobodan Jovičić;Zoran Šarić
Affiliations:
RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;University of Belgrade, Serbia and Montenegro;University of Belgrade, Serbia and Montenegro
Venue:
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Year:
2005

Citing 7
Cited 3

Translating with Scarce Resources

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information

Computational Linguistics
Toward hierarchical models for statistical machine translation of inflected languages

DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14

Speech-input multi-target machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Morpho-syntactic information for automatic error analysis of statistical machine translation output

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Using TectoMT as a preprocessing tool for phrase-based statistical machine translation

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this work, we examine the quality of several statistical machine translation systems constructed on a small amount of parallel Serbian-English text. The main bilingual parallel corpus consists of about 3k sentences and 20k running words from an unrestricted domain. The translation systems are built on the full corpus as well as on a reduced corpus containing only 200 parallel sentences. A small set of about 350 short phrases from the web is used as additional bilingual knowledge. In addition, we investigate the use of monolingual morpho-syntactic knowledge i.e. base forms and POS tags.