Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information
Computational Linguistics
Clause restructuring for statistical machine translation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Non-projective dependency parsing using spanning tree algorithms
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Improving statistical MT through morphological analysis
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Arabic preprocessing schemes for statistical machine translation
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
The University of Washington machine translation system for ACL WMT 2008
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
TectoMT: highly modular MT system with tectogrammatics used as transfer layer
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
The RWTH machine translation system for WMT 2009
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Morpho-syntactic Arabic preprocessing for Arabic-to-English statistical machine translation
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Case markers and morphology: addressing the crux of the fluency problem in English-Hindi SMT
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Hi-index | 0.00 |
We present a systematic comparison of preprocessing techniques for two language pairs: English-Czech and English-Hindi. The two target languages, although both belonging to the Indo-European language family, show significant differences in morphology, syntax and word order. We describe how TectoMT, a successful framework for analysis and generation of language, can be used as preprocessor for a phrase-based MT system. We compare the two language pairs and the optimal sets of source-language transformations applied to them. The following transformations are examples of possible preprocessing steps: lemmatization; retokenization, compound splitting; removing/adding words lacking counterparts in the other language; phrase reordering to resemble the target word order; marking syntactic functions. TectoMT, as well as all other tools and data sets we use, are freely available on the Web.