Foundations of statistical natural language processing
Foundations of statistical natural language processing
Improved Named Entity Translation and Bilingual Named Entity Extraction
ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora
Computational Linguistics
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Translating named entities using monolingual and bilingual resources
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A phrase-based, joint probability model for statistical machine translation
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Transliteration of proper names in cross-lingual information retrieval
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Experiences with English-Hindi, English-Tamil and English-Kannada transliteration tasks at NEWS 2009
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Transliteration for Resource-Scarce Languages
ACM Transactions on Asian Language Information Processing (TALIP)
Language identification of names with SVMs
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Machine transliteration survey
ACM Computing Surveys (CSUR)
Unsupervised language-independent name translation mining from Wikipedia infoboxes
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Hi-index | 0.00 |
Existing named entity (NE) transliteration approaches often exploit a general model to transliterate NEs, regardless of their origins. As a result, both a Chinese name and a French name (assuming it is already translated into Chinese) will be translated into English using the same model, which often leads to unsatisfactory performance. In this paper we propose a cluster-specific NE transliteration framework. We group name origins into a smaller number of clusters, then train transliteration and language models for each cluster under a statistical machine translation framework. Given a source NE, we first select appropriate models by classifying it into the most likely cluster, then we transliterate this NE with the corresponding models. We also propose a phrase-based name transliteration model, which effectively combines context information for transliteration. Our experiments showed substantial improvement on the transliteration accuracy over a state-of-the-art baseline system, significantly reducing the transliteration character error rate from 50.29% to 12.84%.