Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval

Authors:
H. Déjean;E. Gaussier;J. -M. Renders;F. Sadat
Affiliations:
Xerox Research Centre Europe, 6 Chemin de Maupertuis, F-38240 Meylan, France;Xerox Research Centre Europe, 6 Chemin de Maupertuis, F-38240 Meylan, France;Xerox Research Centre Europe, 6 Chemin de Maupertuis, F-38240 Meylan, France;Nara Institute of Science and Technology, Ikoma, Nara, Japan
Venue:
Artificial Intelligence in Medicine
Year:
2005

Citing 12
Cited 6

Word sense disambiguation using a second language monolingual corpus

Computational Linguistics
Information Retrieval

Information Retrieval
The Effects of Conjunction, Facet Structure, and Dictionary Combinations in Concept-Based Cross-Language Retrieval

Information Retrieval
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
A word-to-word model of translational equivalence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Flow network models for word alignment and terminology extraction from bilingual corpora

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Extraction of lexical translations from non-aligned corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An approach based on multilingual thesauri and model combination for bilingual lexicon extraction

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1

Leveraging reusability: cost-effective lexical acquisition for large-scale ontology translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Translating medical terminologies through word alignment in parallel text corpora

Journal of Biomedical Informatics
A multilingual ontology framework for R&D project management systems

Expert Systems with Applications: An International Journal
Revisiting context-based projection methods for term-translation spotting in comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A multi-view approach for term translation spotting

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Building and using comparable corpora for domain-specific bilingual lexicon extraction

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objectives:: We present in this article experiments on multi-language information extraction and access in the medical domain. For such applications, multilingual terminology plays a crucial role when working on specialized languages and specific domains. Material and methods:: We propose firstly a method for enriching multilingual thesauri which extracts new terms from parallel corpora, and secondly, a new approach for bilingual lexicon extraction from comparable corpora, which uses a bilingual thesaurus as a pivot. We illustrate their use in multi-language information retrieval (English/German) in the medical domains. Results:: Our experiments show that these automatically extracted bilingual lexicons are accurate enough (85% precision for term extraction) for semi-automatically enriching mono- or bi-lingual thesauri such as the universal medical language system, and that their use in cross-language information retrieval significantly improves the retrieval performance (from 22 to 40% average precision) and clearly outperforms existing bilingual lexicon resources (both general lexicons and specialized ones). Conclusion:: We show in this paper first that bilingual lexicon extraction from parallel corpora in the medical domain could lead to accurate, specialized lexicons, which can be used to help enrich existing thesauri and second that bilingual lexicons extracted from comparable corpora outperform general bilingual resources for cross-language information retrieval.