Unsupervised language-independent name translation mining from Wikipedia infoboxes

Authors:
Wen-Pin Lin;Matthew Snover;Heng Ji
Affiliations:
City University of New York, New York, NY;City University of New York, New York, NY;City University of New York, New York, NY
Venue:
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Year:
2011

Citing 15
Cited 2

An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Mining new word translations from comparable corpora

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Cluster-specific named entity transliteration

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Information arbitrage across multi-lingual Wikipedia

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Cross-lingual alignment and completion of Wikipedia templates

CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
MINT: a method for effective and scalable mining of named entity transliterations from large comparable corpora

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Freebase: a shared database of structured general human knowledge

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Mining name translations from comparable corpora by creating bilingual information networks

BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
BabelNet: building a very large multilingual semantic network

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Untangling the cross-lingual link structure of Wikipedia

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Mining name translations from entity graph mapping

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Accurate unsupervised joint named-entity extraction from unaligned parallel text

NEWS '12 Proceedings of the 4th Named Entity Workshop
Analysis and refinement of cross-lingual entity linking

CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics

Quantified Score

Hi-index	0.01

Visualization

Abstract

The automatic generation of entity profiles from unstructured text, such as Knowledge Base Population, if applied in a multi-lingual setting, generates the need to align such profiles from multiple languages in an unsupervised manner. This paper describes an unsupervised and language-independent approach to mine name translation pairs from entity profiles, using Wikipedia Infoboxes as a stand-in for high quality entity profile extraction. Pairs are initially found using expressions that are written in language-independent forms (such as dates and numbers), and new translations are then mined from these pairs. The algorithm then iteratively bootstraps from these translations to learn more pairs and more translations. The algorithm maintains a high precision, over 95%, for the majority of its iterations, with a slightly lower precision of 85.9% and an f-score of 76%. A side effect of the name mining algorithm is the unsupervised creation of a translation lexicon between the two languages, with an accuracy of 64%. We also duplicate three state-of-the-art name translation mining methods and use two existing name translation gazetteers to compare with our approach. Comparisons show our approach can effectively augment the results from each of these alternative methods and resources.