Automatic transliteration for Japanese-to-English text retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Computational Linguistics
Machine transliteration of names in Arabic text
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Cluster-specific named entity transliteration
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Automatic Acronym Dictionary Construction Based on Acronym Generation Types
IEICE - Transactions on Information and Systems
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
A comparison of different machine transliteration models
Journal of Artificial Intelligence Research
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Identification of transliterated foreign words in Hebrew script
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Improving name origin recognition with context features and unlabelled data
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Improving machine transliteration performance by using multiple transliteration models
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Extracting english-korean transliteration pairs from web corpora
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Learning to find translations and transliterations on the web
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Hi-index | 0.00 |
Multilingual applications frequently involve dealing with proper names, but names are often missing in bilingual lexicons. This problem is exacerbated for applications involving translation between Latin-scripted languages and Asian languages such as Chinese, Japanese and Korean (CJK) where simple string copying is not a solution. We present a novel approach for generating the ideographic representations of a CJK name written in a Latin script. The proposed approach involves first identifying the origin of the name, and then back-transliterating the name to all possible Chinese characters using language-specific mappings. To reduce the massive number of possibilities for computation, we apply a three-tier filtering process by filtering first through a set of attested bigrams, then through a set of attested terms, and lastly through the WWW for a final validation. We illustrate the approach with English-to-Japanese back-transliteration. Against test sets of Japanese given names and surnames, we have achieved average precisions of 73% and 90%, respectively.