Machine translation: past, present, future
Machine translation: past, present, future
Fast and Accurate Sentence Alignment of Bilingual Corpora
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
A systematic comparison of various statistical alignment models
Computational Linguistics
Computational Linguistics - Special issue on web as corpus
Evaluating translational correspondence using annotation projection
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Improving statistical MT through morphological analysis
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Effects of morphological analysis in translation between German and English
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Automatic diacritic restoration for resource-scarce languages
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Statistical machine translation into a morphologically complex language
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Data-Driven part-of-speech tagging of kiswahili
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Subword variation in text message classification
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Exploring the sawa corpus: collection and deployment of a parallel corpus English--Swahili
Language Resources and Evaluation
Hi-index | 0.00 |
Research in data-driven methods for Machine Translation has greatly benefited from the increasing availability of parallel corpora. Processing the same text in two different languages yields useful information on how words and phrases are translated from a source language into a target language. To investigate this, a parallel corpus is typically aligned by linking linguistic tokens in the source language to the corresponding units in the target language. An aligned parallel corpus therefore facilitates the automatic development of a machine translation system and can also bootstrap annotation through projection. In this paper, we describe data collection and annotation efforts and preliminary experimental results with a parallel corpus English - Swahili.