Extraction of transliteration pairs from parallel corpora using a statistical transliteration model

  • Authors:
  • Chun-Jen Lee;Jason S. Chang;Jyh-Shing Roger Jang

  • Affiliations:
  • Telecommunication Labs., Chunghwa Telecom Co., Ltd., 326 Chungli, Taiwan and Department of Computer Science, National Tsing Hua University, 300 Hsinchu, Taiwan;Department of Computer Science, National Tsing Hua University, 300 Hsinchu, Taiwan;Department of Computer Science, National Tsing Hua University, 300 Hsinchu, Taiwan

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2006

Quantified Score

Hi-index 0.07

Visualization

Abstract

This paper describes a framework for modeling the machine transliteration problem. The parameters of the proposed model are automatically acquired through statistical learning from a bilingual proper name list. Unlike previous approaches, the model does not involve the use of either a pronunciation dictionary for converting source words into phonetic symbols or manually assigned phonetic similarity scores between source and target words. We also report how the model is applied to extract proper names and corresponding transliterations from parallel corpora. Experimental results show that the average rates of word and character precision are 93.8% and 97.8%, respectively.