Learning transliteration lexicons from the web

  • Authors:
  • Jin-Shea Kuo;Haizhou Li;Ying-Kuei Yang

  • Affiliations:
  • Chung-Hwa Telecom., Laboratories, Taiwan and National Taiwan University of Science and Technology, Taiwan;Institute for Infocomm, Research, Singapore;National Taiwan University of Science and Technology, Taiwan

  • Venue:
  • ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an adaptive learning framework for Phonetic Similarity Modeling (PSM) that supports the automatic construction of transliteration lexicons. The learning algorithm starts with minimum prior knowledge about machine transliteration, and acquires knowledge iteratively from the Web. We study the active learning and the unsupervised learning strategies that minimize human supervision in terms of data labeling. The learning process refines the PSM and constructs a transliteration lexicon at the same time. We evaluate the proposed PSM and its learning algorithm through a series of systematic experiments, which show that the proposed framework is reliably effective on two independent databases.