Bootstrapping dictionaries for cross-language information retrieval

  • Authors:
  • Kornél Markó;Stefan Schulz;Olena Medelyan;Udo Hahn

  • Affiliations:
  • Freiburg University Hospital, Stefan-Meier-Str., Freiburg, Germany;Freiburg University Hospital, Stefan-Meier-Str., Freiburg, Germany;Freiburg University Hospital, Stefan-Meier-Str., Freiburg, Germany;Jena University, Fürstengraben, Jena, Germany

  • Venue:
  • Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

The bottleneck for dictionary-based cross-language information retrieval is the lack of comprehensive dictionaries, in particular for many different languages. We here introduce a methodology by which multilingual dictionaries (for Spanish and Swedish) emerge automatically from simple seed lexicons. These seed lexicons are automatically generated, by cognate mapping, from (previously manually constructed) Portuguese and German as well as English sources. Lexical and semantic hypotheses are then validated and new ones iteratively generated by making use of co-occurrence patterns of hypothesized translation synonyms in parallel corpora. We evaluate these newly derived dictionaries on a large medical document collection within a cross-language retrieval setting.