Harvesting Regional Transliteration Variants with Guided Search

Authors:
Jin-Shea Kuo;Haizhou Li;Chih-Lung Lin
Affiliations:
Chung-Hwa Telecomm. Labs., Taoyuan, Taiwan;Institute for Infocomm Research, Singapore;Chung Yuan Christian University, Taoyuan, Taiwan
Venue:
ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Year:
2009

Citing 11
Cited 2

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Cross-training: learning probabilistic mappings between topics

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine transliteration

Computational Linguistics
Iterative cross-training: An algorithm for learning from unlabeled Web pages

International Journal of Intelligent Systems - Intelligent Technologies
A joint source-channel model for machine transliteration

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Creating multilingual translation lexicons with regional variations using web corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Named entity transliteration with comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A phonetic similarity model for automatic extraction of transliteration pairs

ACM Transactions on Asian Language Information Processing (TALIP)
An ensemble of grapheme and phoneme for machine transliteration

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Machine transliteration survey

ACM Computing Surveys (CSUR)
Learning regional transliteration variants

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a method to harvest regional transliteration variants with guided search. We first study how to incorporate transliteration knowledge into query formulation so as to significantly increase the chance of desired transliteration returns. Then, we study a cross-training algorithm, which explores valuable information across different regional corpora for the learning of transliteration models to in turn improve the overall extraction performance. The experimental results show that the proposed method not only effectively harvests a lexicon of regional transliteration variants but also mitigates the need of manual data labeling for transliteration modeling. We also conduct an inquiry into the underlying characteristics of regional transliterations that motivate the cross-training algorithm.