Learning regional transliteration variants

Authors:
Jin-Shea Kuo;Haizhou Li
Affiliations:
Chunghwa Telecommunication Laboratories, 12, Lane 551, Min-Tsu Rd., Sec. 5, Yang-Mei, Taoyuan 326, Taiwan;Institute for Infocomm Research, 1 Fusionopolis Way, #08-05 South Tower, Connexis 138632, Singapore
Venue:
Information Processing and Management: an International Journal
Year:
2012

Citing 26
Cited 0

Internet agents: spiders, wanderers, brokers, and bots

Internet agents: spiders, wanderers, brokers, and bots
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Analysis of a very large web search engine query log

ACM SIGIR Forum
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Query Expansion by Mining User Logs

IEEE Transactions on Knowledge and Data Engineering
Cross-training: learning probabilistic mappings between topics

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine transliteration

Computational Linguistics
Iterative cross-training: An algorithm for learning from unlabeled Web pages

International Journal of Intelligent Systems - Intelligent Technologies
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Transliteration of proper names in cross-lingual information retrieval

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Mining long-term search history to improve search accuracy

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A joint source-channel model for machine transliteration

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Creating multilingual translation lexicons with regional variations using web corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Named entity transliteration with comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Active learning for constructing transliteration lexicons from the Web

Journal of the American Society for Information Science and Technology
Query suggestions for mobile search: understanding usage patterns

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Using English information in non-English web search

Proceedings of the 2nd ACM workshop on Improving non english web searching
Harvesting Regional Transliteration Variants with Guided Search

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Semisupervised Learning for Computational Linguistics

Semisupervised Learning for Computational Linguistics
Named entity translation with web mining and transliteration

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A term dependency-based approach for query terms ranking

Proceedings of the 18th ACM conference on Information and knowledge management
An ensemble of grapheme and phoneme for machine transliteration

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper conducts an inquiry into regional transliteration variants across Chinese speaking regions. We begin by studying the social association of regional transliterations, followed by postulating a computational model for effective transliteration extraction from the Web. In the computational model, we first propose constraint-based exploration by incorporating transliteration knowledge from transliteration modeling and predictive query suggestions from search engines into query formulation as constraints so as to increase the chance of desired transliteration returns in learning regional transliteration variants. Then, we study a cross-training algorithm, which explores the attainably helpful information of transliteration mappings across related regional corpora for the learning of transliteration models, to improve the overall extraction performance. The experimental results show that the proposed method not only effectively harvests a lexicon of regional transliteration variants but also mitigates the need of manual data labeling for transliteration modeling. We also carry out an investigation into the underlying characteristics of regional transliterations that motivate the cross-training algorithm.