English to persian transliteration

Authors:
Sarvnaz Karimi;Andrew Turpin;Falk Scholer
Affiliations:
School of Computer Science and Information Technology, RMIT University, GPO, Melbourne, Australia;School of Computer Science and Information Technology, RMIT University, GPO, Melbourne, Australia;School of Computer Science and Information Technology, RMIT University, GPO, Melbourne, Australia
Venue:
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Year:
2006

Citing 11
Cited 4

Finding the k Shortest Paths

SIAM Journal on Computing
Approximate String Matching

ACM Computing Surveys (CSUR)
A systematic comparison of various statistical alignment models

Computational Linguistics
Statistical transliteration for english-arabic cross language information retrieval

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Machine transliteration

Computational Linguistics
An English to Korean transliteration model of extended Markov window

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Translating cross-lingual spelling variants using transformation rules

Information Processing and Management: an International Journal
Multilingual modeling of cross-lingual spelling variants

Information Retrieval
An ensemble of transliteration models for information retrieval

Information Processing and Management: an International Journal
Direct combination of spelling and pronunciation information for robust back-transliteration

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Named entity transliteration for cross-language information retrieval using compressed word format mapping algorithm

Proceedings of the 2nd ACM workshop on Improving non english web searching
Machine transliteration survey

ACM Computing Surveys (CSUR)
Non-productive machine transliteration

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Rescoring a phrase-based machine transliteration system with recurrent neural network language models

NEWS '12 Proceedings of the 4th Named Entity Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Persian is an Indo-European language written using Arabic script, and is an official language of Iran, Afghanistan, and Tajikistan. Transliteration of Persian to English—that is, the character-by-character mapping of a Persian word that is not readily available in a bilingual dictionary—is an unstudied problem. In this paper we make three novel contributions. First, we present performance comparisons of existing grapheme-based transliteration methods on English to Persian. Second, we discuss the difficulties in establishing a corpus for studying transliteration. Finally, we introduce a new model of Persian that takes into account the habit of shortening, or even omitting, runs of English vowels. This trait makes transliteration of Persian particularly difficult for phonetic based methods. This new model outperforms the existing grapheme based methods on Persian, exhibiting a 24% relative increase in transliteration accuracy measured using the top-5 criteria.