Efficient Entity Translation Mining: A Parallelized Graph Alignment Approach

Authors:
Gae-Won You;Seung-Won Hwang;Young-In Song;Long Jiang;Zaiqing Nie
Affiliations:
Pohang University of Science and Technology;Pohang University of Science and Technology;Microsoft Research Asia;Microsoft Research Asia;Microsoft Research Asia
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2012

Citing 21
Cited 0

How Good is Recursive Bisection?

SIAM Journal on Scientific Computing
Introduction to algorithms

Introduction to algorithms
OpenMP: An Industry-Standard API for Shared-Memory Programming

IEEE Computational Science & Engineering
A linear-time heuristic for improving network partitions

DAC '82 Proceedings of the 19th Design Automation Conference
Machine transliteration

Computational Linguistics
The TREC question answering track

Natural Language Engineering
Automatic English-Chinese name transliteration for development of multilingual resources

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
An algorithm for finding noun phrase correspondences in bilingual corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Balanced graph partitioning

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Improved source-channel models for Chinese word segmentation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Named entity translation matching and learning: With application for mining unseen translations

ACM Transactions on Information Systems (TOIS)
A joint source-channel model for machine transliteration

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Named entity transliteration with comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Mining new word translations from comparable corpora

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Named entity translation with web mining and transliteration

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Mining bilingual data from the web with adaptively learnt patterns

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Mining name translations from entity graph mapping

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Multilevel algorithms for partitioning power-law graphs

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article studies the problem of mining entity translation, specifically, mining English and Chinese name pairs. Existing efforts can be categorized into (a) transliteration-based approaches that leverage phonetic similarity and (b) corpus-based approaches that exploit bilingual cooccurrences. These approaches suffer from inaccuracy and scarcity, respectively. In clear contrast, we use under-leveraged resources of monolingual entity cooccurrences crawled from entity search engines, which are represented as two entity-relationship graphs extracted from two language corpora, respectively. Our problem is then abstracted as finding correct mappings across two graphs. To achieve this goal, we propose a holistic approach to exploiting both transliteration similarity and monolingual cooccurrences. This approach, which builds upon monolingual corpora, complements existing corpus-based work requiring scarce resources of parallel or comparable corpus while significantly boosting the accuracy of transliteration-based work. In addition, by parallelizing the mapping process on multicore architectures, we speed up the computation by more than 10 times per unit accuracy. We validated the effectiveness and efficiency of our proposed approach using real-life datasets.