SIAM Journal on Computing
ACM Computing Surveys (CSUR)
A systematic comparison of various statistical alignment models
Computational Linguistics
Statistical transliteration for english-arabic cross language information retrieval
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Computational Linguistics
An English to Korean transliteration model of extended Markov window
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Translating cross-lingual spelling variants using transformation rules
Information Processing and Management: an International Journal
Multilingual modeling of cross-lingual spelling variants
Information Retrieval
An ensemble of transliteration models for information retrieval
Information Processing and Management: an International Journal
Direct combination of spelling and pronunciation information for robust back-transliteration
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Proceedings of the 2nd ACM workshop on Improving non english web searching
Machine transliteration survey
ACM Computing Surveys (CSUR)
Non-productive machine transliteration
RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
NEWS '12 Proceedings of the 4th Named Entity Workshop
Hi-index | 0.00 |
Persian is an Indo-European language written using Arabic script, and is an official language of Iran, Afghanistan, and Tajikistan. Transliteration of Persian to English—that is, the character-by-character mapping of a Persian word that is not readily available in a bilingual dictionary—is an unstudied problem. In this paper we make three novel contributions. First, we present performance comparisons of existing grapheme-based transliteration methods on English to Persian. Second, we discuss the difficulties in establishing a corpus for studying transliteration. Finally, we introduce a new model of Persian that takes into account the habit of shortening, or even omitting, runs of English vowels. This trait makes transliteration of Persian particularly difficult for phonetic based methods. This new model outperforms the existing grapheme based methods on Persian, exhibiting a 24% relative increase in transliteration accuracy measured using the top-5 criteria.