Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources

Authors:
Chun-Jen Lee;Jason S. Chang;Jyh-Shing R. Jang
Affiliations:
National Tsing Hua University, Hsinchu, Taiwan;National Tsing Hua University, Hsinchu, Taiwan;National Tsing Hua University, Hsinchu, Taiwan
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2006

Citing 33
Cited 10

An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A technique for computer detection and correction of spelling errors

Communications of the ACM
Adaptive Bilingual Sentence Alignment

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
A systematic comparison of various statistical alignment models

Computational Linguistics
Automatic construction of English/Chinese parallel corpora

Journal of the American Society for Information Science and Technology
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
Introduction to the special issue on the web as corpus

Computational Linguistics - Special issue on web as corpus
The Web as a parallel corpus

Computational Linguistics - Special issue on web as corpus
Embedding web-based statistical translation models in cross-language information retrieval

Computational Linguistics - Special issue on web as corpus
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Machine transliteration

Computational Linguistics
Automatic English-Chinese name transliteration for development of multilingual resources

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Proper name translation in cross-language information retrieval

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Anchor text mining for translation of Web queries: A transitive translation approach

ACM Transactions on Information Systems (TOIS)
Word identification for Mandarin Chinese sentences

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Translating unknown queries with web corpora for cross-language information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Using the web for automated translation extraction in cross-language information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Learning translations of named-entity phrases from parallel corpora

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
An English-Korean transliteration model using pronunciation and contextual rules

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Named entity recognition using an HMM-based chunk tagger

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Language independent named entity classification by modified transformation-based learning and by decision tree induction

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Named Entity Extraction using AdaBoost

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Backward machine transliteration by learning phonetic similarity

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Boosting for named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Acquisition of English-Chinese transliterated word pairs from parallel-aligned texts using a statistical machine transliteration model

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Learning formulation and transformation rules for multilingual named entities

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Automatic extraction of named entity translingual equivalence based on multi-feature cost minimization

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Translating names and technical terms in Arabic text

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Extraction of transliteration pairs from parallel corpora using a statistical transliteration model

Information Sciences: an International Journal
Acquiring bilingual named entity translations from content-aligned corpora

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Measuring similarity between transliterations against noise data

ACM Transactions on Asian Language Information Processing (TALIP)
A Structure-Based Model for Chinese Organization Name Translation

ACM Transactions on Asian Language Information Processing (TALIP)
Synonymous Chinese Transliterations Retrieval from World Wide Web by Using Association Words

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
English-Chinese bi-directional OOV translation based on web mining and supervised learning

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
A Chinese-English organization name translation system using heuristic web mining and asymmetric alignment

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Mining Synonymous Transliterations from the World Wide Web

ACM Transactions on Asian Language Information Processing (TALIP)
On jointly recognizing and aligning bilingual named entities

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Machine transliteration survey

ACM Computing Surveys (CSUR)
Fusion of multiple features and ranking SVM for web-based English-Chinese OOV term translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A joint model to identify and align bilingual named entities

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named entity (NE) extraction is one of the fundamental tasks in natural language processing (NLP). Although many studies have focused on identifying NEs within monolingual documents, aligning NEs in bilingual documents has not been investigated extensively due to the complexity of the task. In this article we introduce a new approach to aligning bilingual NEs in parallel corpora by incorporating statistical models with multiple knowledge sources. In our approach, we model the process of translating an English NE phrase into a Chinese equivalent using lexical translation/transliteration probabilities for word translation and alignment probabilities for word reordering. The method involves automatically learning phrase alignment and acquiring word translations from a bilingual phrase dictionary and parallel corpora, and automatically discovering transliteration transformations from a training set of name-transliteration pairs. The method also involves language-specific knowledge functions, including handling abbreviations, recognizing Chinese personal names, and expanding acronyms. At runtime, the proposed models are applied to each source NE in a pair of bilingual sentences to generate and evaluate the target NE candidates; the source and target NEs are then aligned based on the computed probabilities. Experimental results demonstrate that the proposed approach, which integrates statistical models with extra knowledge sources, is highly feasible and offers significant improvement in performance compared to our previous work, as well as the traditional approach of IBM Model 4.