Mining new word translations from comparable corpora

Authors:
Li Shao;Hwee Tou Ng
Affiliations:
National University of Singapore, Singapore;National University of Singapore, Singapore
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 10
Cited 24

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Machine transliteration

Computational Linguistics
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Identifying word translations in non-parallel texts

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Learning translations of named-entity phrases from parallel corpora

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Base Noun Phrase translation using web data and the EM algorithm

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning a translation lexicon from monolingual corpora

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Machine transliteration of names in Arabic text

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages

Named entity translation matching and learning: With application for mining unseen translations

ACM Transactions on Information Systems (TOIS)
Extracting parallel sub-sentential fragments from non-parallel corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A Structure-Based Model for Chinese Organization Name Translation

ACM Transactions on Asian Language Information Processing (TALIP)
Semi-supervised lexicon mining from parenthetical expressions in monolingual web pages

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Named entity translation with web mining and transliteration

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Mining bilingual data from the web with adaptively learnt patterns

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Learning Spanish-Galician translation equivalents using a comparable corpus and a bilingual dictionary

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Mining name translations from entity graph mapping

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Revisiting context-based projection methods for term-translation spotting in comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Fusion of multiple features and ranking SVM for web-based English-Chinese OOV term translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A multi-view approach for term translation spotting

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Building and using comparable corpora for domain-specific bilingual lexicon extraction

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Cross-lingual slot filling from comparable corpora

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Mining English-Chinese Named Entity Pairs from Comparable Corpora

ACM Transactions on Asian Language Information Processing (TALIP)
Bootstrapping bilingual lexicons from comparable corpora for closely related languages

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Mining entity translations from comparable corpora: a holistic graph mapping approach

Proceedings of the 20th ACM international conference on Information and knowledge management
Unsupervised language-independent name translation mining from Wikipedia infoboxes

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Mining parenthetical translations for polish-english lexica

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Automatic generation of bilingual dictionaries using intermediary languages and comparable corpora

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Extraction of bilingual cognates from wikipedia

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Efficient Entity Translation Mining: A Parallelized Graph Alignment Approach

ACM Transactions on Information Systems (TOIS)
Bilingual lexicon extraction from comparable corpora using label propagation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A joint model to identify and align bilingual named entities

Computational Linguistics
Statistical machine translation enhancements through linguistic levels: A survey

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

New words such as names, technical terms, etc appear frequently. As such, the bilingual lexicon of a machine translation system has to be constantly updated with these new word translations. Comparable corpora such as news documents of the same period from different news agencies are readily available. In this paper, we present a new approach to mining new word translations from comparable corpora, by using context information to complement transliteration information. We evaluated our approach on six months of Chinese and English Gigaword corpora, with encouraging results.