Unsupervised named entity transliteration using temporal and phonetic correlation

Authors:
Tao Tao;Su-Youn Yoon;Andrew Fister;Richard Sproat;ChengXiang Zhai
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Year:
2006

Citing 14
Cited 18

A stochastic finite-state word-segmentation algorithm for Chinese

Computational Linguistics
Resolving ambiguity for cross-language retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text

Information Retrieval
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
Machine transliteration

Computational Linguistics
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Identifying word translations in non-parallel texts

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Extraction of lexical translations from non-aligned corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Minimally supervised morphological analysis by multimodal alignment

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Mining comparable bilingual text corpora for cross-language information integration

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Machine transliteration of names in Arabic text

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Named entity transliteration with comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Phoneme-Based transliteration of foreign names for OOV problem

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Named entity transliteration with comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Mining named entity transliteration equivalents from comparable corpora

Proceedings of the 17th ACM conference on Information and knowledge management
Active sample selection for named entity transliteration

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
MINT: a method for effective and scalable mining of named entity transliterations from large comparable corpora

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Learning to match names across languages

MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization
Transliteration as constrained optimization

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning phoneme mappings for transliteration without parallel data

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised constraint driven learning for transliteration discovery

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning better transliterations

Proceedings of the 18th ACM conference on Information and knowledge management
Homophones and tonal patterns in English-Chinese transliteration

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Transliteration alignment

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Phonological context approximation and homophone treatment for NEWS 2009 English-Chinese transliteration shared task

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Graphemic approximation of phonological context for English-Chinese transliteration

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Machine transliteration survey

ACM Computing Surveys (CSUR)
An algorithm for unsupervised transliteration mining with an application to word alignment

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Mining English-Chinese Named Entity Pairs from Comparable Corpora

ACM Transactions on Asian Language Information Processing (TALIP)
Regularized interlingual projections: evaluation on multilingual transliteration

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Accurate unsupervised joint named-entity extraction from unaligned parallel text

NEWS '12 Proceedings of the 4th Named Entity Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we investigate unsupervised name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics --- and therefore share references to named entities --- but are not translations of each other. We present two distinct methods for transliteration, one approach using an unsupervised phonetic transliteration method, and the other using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We believe that the novelty of our approach lies in the phonetic-based scoring method, which is based on a combination of carefully crafted phonetic features, and empirical results from the pronunciation errors of second-language learners of English. Unlike previous approaches to transliteration, this method can in principle work with any pair of languages in the absence of a training dictionary, provided one has an estimate of the pronunciation of words in text.