Harvesting Regional Transliteration Variants with Guided Search

  • Authors:
  • Jin-Shea Kuo;Haizhou Li;Chih-Lung Lin

  • Affiliations:
  • Chung-Hwa Telecomm. Labs., Taoyuan, Taiwan;Institute for Infocomm Research, Singapore;Chung Yuan Christian University, Taoyuan, Taiwan

  • Venue:
  • ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a method to harvest regional transliteration variants with guided search. We first study how to incorporate transliteration knowledge into query formulation so as to significantly increase the chance of desired transliteration returns. Then, we study a cross-training algorithm, which explores valuable information across different regional corpora for the learning of transliteration models to in turn improve the overall extraction performance. The experimental results show that the proposed method not only effectively harvests a lexicon of regional transliteration variants but also mitigates the need of manual data labeling for transliteration modeling. We also conduct an inquiry into the underlying characteristics of regional transliterations that motivate the cross-training algorithm.