The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Translation of web queries using anchor text mining
ACM Transactions on Asian Language Information Processing (TALIP)
Automatic transliteration for Japanese-to-English text retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Computational Linguistics
Automatic English-Chinese name transliteration for development of multilingual resources
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
An English to Korean transliteration model of extended Markov window
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Identification and classification of proper nouns in Chinese texts
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Learning phonetic similarity for matching named entity translations and mining new translations
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
An English-Korean transliteration model using pronunciation and contextual rules
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Translating named entities using monolingual and bilingual resources
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Using the web as a bilingual dictionary
DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
Extracting pronunciation-translated names from Chinese texts using bootstrapping approach
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Backward machine transliteration by learning phonetic similarity
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Learning formulation and transformation rules for multilingual named entities
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Transliteration of proper names in cross-lingual information retrieval
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Speech and Language Processing (2nd Edition)
Speech and Language Processing (2nd Edition)
A joint source-channel model for machine transliteration
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Constructing transliteration lexicons from web corpora
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Phoneme-Based transliteration of foreign names for OOV problem
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Synonymous Chinese Transliterations Retrieval from World Wide Web by Using Association Words
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Harvesting Regional Transliteration Variants with Guided Search
ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
A syllable-based name transliteration system
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Mining Synonymous Transliterations from the World Wide Web
ACM Transactions on Asian Language Information Processing (TALIP)
Transliteration mining with phonetic conflation and iterative training
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Machine transliteration survey
ACM Computing Surveys (CSUR)
Mining named entities with temporally correlated bursts from multilingual web news streams
Proceedings of the fourth ACM international conference on Web search and data mining
Improving name origin recognition with context features and unlabelled data
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Improved transliteration mining using graph reinforcement
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Transliteration mining using large training and test sets
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Hi-index | 0.01 |
This article proposes an approach for the automatic extraction of transliteration pairs from Chinese Web corpora. In this approach, we formulate the machine transliteration process using a syllable-based phonetic similarity model which consists of phonetic confusion matrices and a Chinese character n-gram language model. With the phonetic similarity model, the extraction of transliteration pairs becomes a two-step process of recognition followed by validation: First, in the recognition process, we identify the most probable transliteration in the k-neighborhood of a recognized English word. Then, in the validation process, we qualify the transliteration pair candidates with a hypothesis test. We carry out an analytical study on the statistics of several key factors in English-Chinese transliteration to help formulate phonetic similarity modeling. We then conduct both supervised and unsupervised learning of a phonetic similarity model on a development database. The experimental results validate the effectiveness of the phonetic similarity model by achieving an F-measure of 0.739 in supervised learning. The unsupervised learning approach works almost as well as the supervised one, thus allowing us to deploy automatic extraction of transliteration pairs in the Web space.