Looking for candidate translational equivalents in specialized, comparable corpora

Authors:
Yun-Chuang Chiao;Pierre Zweigenbaum
Affiliations:
Hôpitaux de Paris, Université Paris;Hôpitaux de Paris, Université Paris
Venue:
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Year:
2002

Citing 4
Cited 24

Text retrieval and filtering: analytic models of performance

Text retrieval and filtering: analytic models of performance
Cross-Language Information Retrieval

Cross-Language Information Retrieval
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Finding translations for low-frequency words in comparable corpora

Machine Translation
Compilation of specialized comparable corpora in French and Japanese

BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Learning Spanish-Galician translation equivalents using a comparable corpus and a bilingual dictionary

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Brains, not brawn: The use of “smart” comparable corpora in bilingual terminology mining

ACM Transactions on Speech and Language Processing (TSLP)
Cross-lingual induction of selectional preferences with bilingual vector spaces

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Robust measurement and comparison of context similarity for finding translation pairs

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Revisiting context-based projection methods for term-translation spotting in comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Improving corpus comparability for bilingual lexicon extraction from comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A multi-view approach for term translation spotting

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Is singular value decomposition useful for word similarity extraction?

Language Resources and Evaluation
Rare word translation extraction from aligned comparable documents

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Bilingual lexicon extraction from comparable corpora enhanced with parallel corpora

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Bilingual lexicon extraction from comparable corpora as metasearch

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Semantic relations in bilingual lexicons

ACM Transactions on Speech and Language Processing (TSLP)
French-english terminology extraction from comparable corpora

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Automatic generation of bilingual dictionaries using intermediary languages and comparable corpora

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Statistical Extraction and Comparison of Pivot Words for Bilingual Lexicon Extension

ACM Transactions on Asian Language Information Processing (TALIP)
QAlign: a new method for bilingual lexicon extraction from comparable corpora

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Extraction of bilingual cognates from wikipedia

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Detecting highly confident word translations from comparable corpora without any prior knowledge

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Measuring comparability of documents in non-parallel corpora for efficient extraction of (semi-)parallel translation equivalents

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Bilingual lexicon extraction from comparable corpora using label propagation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A language modeling approach for extracting translation knowledge from comparable corpora

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous attempts at identifying translational equivalents in comparable corpora have dealt with very large 'general language' corpora and words. We address this task in a specialized domain, medicine, starting from smaller non-parallel, comparable corpora and an initial bilingual medical lexicon. We compare the distributional contexts of source and target words, testing several weighting factors and similarity measures. On a test set of frequently occurring words, for the best combination (the Jaccard similarity measure with or without tf.idf weighting), the correct translation is ranked first for 20% of our test words, and is found in the top 10 candidates for 50% of them. An additional reverse-translation filtering step improves the precision of the top candidate translation up to 74%, with a 33% recall.