Learning indexing patterns from one language for the benefit of others

Authors:
Udo Hahn;Kornél Markó;Stefan Schulz
Affiliations:
Text Knowledge Engineering Lab, Freiburg University, Freiburg, Germany;Text Knowledge Engineering Lab, Freiburg University, Freiburg, Germany and Department of Medical Informatics, Freiburg University Hospital, Freiburg, Germany;Department of Medical Informatics, Freiburg University Hospital, Freiburg, Germany
Venue:
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Year:
2004

Citing 2
Cited 1

Cross-language information retrieval with the UMLS metathesaurus

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)

Unsupervised multilingual word sense disambiguation via an interlingua

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3

Quantified Score

Hi-index	0.00

Visualization

Abstract

Using language technology for text analysis and light-weight ontologies as a content-mediating level, we acquire indexing patterns from vast amounts of indexing data for English-language medical documents. This is achieved by statistically relating interlingual representations of these documents (based on text token bigrams) to their associated index terms. From these 'English' indexing patterns, we then induce the associated index terms for German and Portuguese documents when their interlingual representations match those of English documents. Thus, we learn from past English indexing experience and transfer it in an unsupervised way to non-English texts, without ever having seen concrete indexing data for languages other than English.