An analysis of the effect of training data variation in English-Persian statistical machine translation

Authors:
Mahsa Mohaghegh;Abdolhossein Sarrafzadeh
Affiliations:
Massey University, IIMS, Auckland, New Zealand;Department of Computing, Unitec, Auckland, New Zealand
Venue:
IIT'09 Proceedings of the 6th international conference on Innovations in information technology
Year:
2009

Citing 6
Cited 0

Fast and Accurate Sentence Alignment of Bilingual Corpora

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Ajax on Rails

Ajax on Rails
Agile Web Development with Rails, Third Edition

Agile Web Development with Rails, Third Edition
Bilingually motivated domain-adapted word segmentation for statistical machine translation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
A web-based interactive computer aided translation tool

ACLDemos '09 Proceedings of the ACL-IJCNLP 2009 Software Demonstrations

Quantified Score

Hi-index	0.00

Visualization

Abstract

Globalization has made machine translation an attractive area of research and development. As technology opens up e-commerce opportunities, companies must overcome language barriers to reach new potential customers and partners. Web2.0 with tools like Google Translate makes the web more accessible. Statistical Machine Translation has been used for translation between many language pairs contributing to its popularity in recent years. It has however not been used for the English/Persian pair. This paper presents the first such attempt and describes the problems faced in creating a corpus and building a base line system. Our experience with the construction of a parallel corpus during this study and the problems encountered especially with the process of alignment are discussed. The prototype constructed and its evaluation is described and results analyzed. In the final part of the paper, conclusions are drawn and work planned for the future is discussed.