Internet agents: spiders, wanderers, brokers, and bots
Internet agents: spiders, wanderers, brokers, and bots
Query expansion using local and global document analysis
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Analysis of a very large web search engine query log
ACM SIGIR Forum
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Query Expansion by Mining User Logs
IEEE Transactions on Knowledge and Data Engineering
Cross-training: learning probabilistic mappings between topics
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Computational Linguistics
Iterative cross-training: An algorithm for learning from unlabeled Web pages
International Journal of Intelligent Systems - Intelligent Technologies
Translating named entities using monolingual and bilingual resources
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Transliteration of proper names in cross-lingual information retrieval
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Mining long-term search history to improve search accuracy
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A joint source-channel model for machine transliteration
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Creating multilingual translation lexicons with regional variations using web corpora
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Named entity transliteration with comparable corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Active learning for constructing transliteration lexicons from the Web
Journal of the American Society for Information Science and Technology
Query suggestions for mobile search: understanding usage patterns
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Using English information in non-English web search
Proceedings of the 2nd ACM workshop on Improving non english web searching
Harvesting Regional Transliteration Variants with Guided Search
ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Semisupervised Learning for Computational Linguistics
Semisupervised Learning for Computational Linguistics
Named entity translation with web mining and transliteration
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A term dependency-based approach for query terms ranking
Proceedings of the 18th ACM conference on Information and knowledge management
An ensemble of grapheme and phoneme for machine transliteration
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Hi-index | 0.00 |
This paper conducts an inquiry into regional transliteration variants across Chinese speaking regions. We begin by studying the social association of regional transliterations, followed by postulating a computational model for effective transliteration extraction from the Web. In the computational model, we first propose constraint-based exploration by incorporating transliteration knowledge from transliteration modeling and predictive query suggestions from search engines into query formulation as constraints so as to increase the chance of desired transliteration returns in learning regional transliteration variants. Then, we study a cross-training algorithm, which explores the attainably helpful information of transliteration mappings across related regional corpora for the learning of transliteration models, to improve the overall extraction performance. The experimental results show that the proposed method not only effectively harvests a lexicon of regional transliteration variants but also mitigates the need of manual data labeling for transliteration modeling. We also carry out an investigation into the underlying characteristics of regional transliterations that motivate the cross-training algorithm.