The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Fast decoding and optimal decoding for machine translation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Translation by the Numbers: Language Weaver
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Bootstrapping the Lexicon Building Process for Machine Translation between `New' Languages
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Classification Approach to Word Selection in Machine Translation
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Cross-lingual C*ST*RD: English access to Hindi information
ACM Transactions on Asian Language Information Processing (TALIP)
Statistical machine translation with word- and sentence-aligned parallel corpora
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Crowdsourcing translation: professional quality from non-professionals
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Parallel sentence generation from comparable corpora for improved SMT
Machine Translation
An iterative stemmer for tamil language
ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part III
Toward statistical machine translation without parallel corpora
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Constructing parallel corpora for six Indian languages via crowdsourcing
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Hi-index | 0.00 |
We report on our experience with building a statistical MT system from scratch, including the creation of a small parallel Tamil-English corpus, and the results of a task-based pilot evaluation of statistical MT systems trained on sets of ca. 1300 and ca. 5000 parallel sentences of Tamil and English data. Our results show that even with apparently incomprehensible system output, humans without any knowledge of Tamil can achieve performance rates as high as 86% accuracy for topic identification, 93% recall for document retrieval, and 64% recall on question answering (plus an additional 14% partially correct answers).