The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
A syntax-based statistical translation model
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
The Alignment Template Approach to Statistical Machine Translation
Computational Linguistics
Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information
Computational Linguistics
A phrase-based, joint probability model for statistical machine translation
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A hierarchical phrase-based model for statistical machine translation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Morphological analysis for statistical machine translation
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Pronunciation disambiguation in turkish
ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
A comparison of merging strategies for translation of German compounds
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Exploring different representational units in English-to-Turkish statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
English-to-Czech factored machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Statistical machine translation into a morphologically complex language
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Unsupervised search for the optimal segmentation for statistical machine translation
ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
IEEE Transactions on Audio, Speech, and Language Processing
Productive generation of compound words in statistical machine translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Generation of compound words in statistical machine translation into compounding languages
Computational Linguistics
Hi-index | 0.00 |
This paper presents some very preliminary results for and problems in developing a statistical machine translation system from English to Turkish. Starting with a baseline word model trained from about 20K aligned sentences, we explore various ways of exploiting morphological structure to improve upon the baseline system. As Turkish is a language with complex agglutinative word structures, we experiment with morphologically segmented and disambiguated versions of the parallel texts in order to also uncover relations between morphemes and function words in one language with morphemes and functions words in the other, in addition to relations between open class content words. Morphological segmentation on the Turkish side also conflates the statistics from allomorphs so that sparseness can be alleviated to a certain extent. We find that this approach coupled with a simple grouping of most frequent morphemes and function words on both sides improve the BLEU score from the baseline of 0.0752 to 0.0913 with the small training data. We close with a discussion on why one should not expect distortion parameters to model word-local morpheme ordering and that a new approach to handling complex morphotactics is needed.