Extraction of transliteration pairs from parallel corpora using a statistical transliteration model

Authors:
Chun-Jen Lee;Jason S. Chang;Jyh-Shing Roger Jang
Affiliations:
Telecommunication Labs., Chunghwa Telecom Co., Ltd., 326 Chungli, Taiwan and Department of Computer Science, National Tsing Hua University, 300 Hsinchu, Taiwan;Department of Computer Science, National Tsing Hua University, 300 Hsinchu, Taiwan;Department of Computer Science, National Tsing Hua University, 300 Hsinchu, Taiwan
Venue:
Information Sciences: an International Journal
Year:
2006

Citing 12
Cited 9

Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Adaptive Bilingual Sentence Alignment

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Machine transliteration

Computational Linguistics
Automatic English-Chinese name transliteration for development of multilingual resources

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Proper name translation in cross-language information retrieval

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
An algorithm for finding noun phrase correspondences in bilingual corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
An English-Korean transliteration model using pronunciation and contextual rules

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Backward machine transliteration by learning phonetic similarity

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Acquisition of English-Chinese transliterated word pairs from parallel-aligned texts using a statistical machine transliteration model

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Translating names and technical terms in Arabic text

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages

Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources

ACM Transactions on Asian Language Information Processing (TALIP)
English-Arabic proper-noun transliteration-pairs creation

Journal of the American Society for Information Science and Technology
Similarity of Names Across Scripts: Edit Distance Using Learned Costs of N-Grams

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Minimum tag error for discriminative training of conditional random fields

Information Sciences: an International Journal
Exploiting Wikipedia and EuroWordNet to solve Cross-Lingual Question Answering

Information Sciences: an International Journal
Maximum N-gram HMM-based name transliteration: experiment in NEWS 2009 on English-Chinese corpus

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Graphemic approximation of phonological context for English-Chinese transliteration

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Machine transliteration survey

ACM Computing Surveys (CSUR)
A survey of methods to ease the development of highly multilingual text mining applications

Language Resources and Evaluation

Quantified Score

Hi-index	0.07

Visualization

Abstract

This paper describes a framework for modeling the machine transliteration problem. The parameters of the proposed model are automatically acquired through statistical learning from a bilingual proper name list. Unlike previous approaches, the model does not involve the use of either a pronunciation dictionary for converting source words into phonetic symbols or manually assigned phonetic similarity scores between source and target words. We also report how the model is applied to extract proper names and corresponding transliterations from parallel corpora. Experimental results show that the average rates of word and character precision are 93.8% and 97.8%, respectively.