A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
Resolving ambiguity for cross-language retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text
Information Retrieval
Computational Linguistics - Special issue on using large corpora: I
Computational Linguistics
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Identifying word translations in non-parallel texts
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Extraction of lexical translations from non-aligned corpora
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Minimally supervised morphological analysis by multimodal alignment
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Mining comparable bilingual text corpora for cross-language information integration
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Machine transliteration of names in Arabic text
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Named entity transliteration with comparable corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Phoneme-Based transliteration of foreign names for OOV problem
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Named entity transliteration with comparable corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Mining named entity transliteration equivalents from comparable corpora
Proceedings of the 17th ACM conference on Information and knowledge management
Active sample selection for named entity transliteration
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Learning to match names across languages
MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization
Transliteration as constrained optimization
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning phoneme mappings for transliteration without parallel data
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised constraint driven learning for transliteration discovery
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning better transliterations
Proceedings of the 18th ACM conference on Information and knowledge management
Homophones and tonal patterns in English-Chinese transliteration
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Graphemic approximation of phonological context for English-Chinese transliteration
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Machine transliteration survey
ACM Computing Surveys (CSUR)
An algorithm for unsupervised transliteration mining with an application to word alignment
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Mining English-Chinese Named Entity Pairs from Comparable Corpora
ACM Transactions on Asian Language Information Processing (TALIP)
Regularized interlingual projections: evaluation on multilingual transliteration
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Accurate unsupervised joint named-entity extraction from unaligned parallel text
NEWS '12 Proceedings of the 4th Named Entity Workshop
Hi-index | 0.00 |
In this paper we investigate unsupervised name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics --- and therefore share references to named entities --- but are not translations of each other. We present two distinct methods for transliteration, one approach using an unsupervised phonetic transliteration method, and the other using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We believe that the novelty of our approach lies in the phonetic-based scoring method, which is based on a combination of carefully crafted phonetic features, and empirical results from the pronunciation errors of second-language learners of English. Unlike previous approaches to transliteration, this method can in principle work with any pair of languages in the absence of a training dictionary, provided one has an estimate of the pronunciation of words in text.