Dictionary-based CLIR loses highly relevant documents

  • Authors:
  • Raija Lehtokangas;Heikki Keskustalo;Kalervo Järvelin

  • Affiliations:
  • Department of Information Studies, University of Tampere, Finland;Department of Information Studies, University of Tampere, Finland;Department of Information Studies, University of Tampere, Finland

  • Venue:
  • ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Research on cross-language information retrieval (CLIR) has typically been restricted to settings using binary relevance assessments. In this paper, we present evaluation results for dictionary-based CLIR using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests. First, monolingual baseline queries were automatically formed from the topics. Secondly, source language topics (in English, German, and Swedish) were automatically translated into the target language (Finnish), using both structured and unstructured queries. Effectiveness of the translated queries was compared to that of the monolingual queries. CLIR performance was evaluated using three relevance criteria: stringent, regular, and liberal. When regular or liberal criteria were used, a reasonable performance was achieved. Adopting stringent criteria caused a considerable loss of performance, when compared to monolingual Finnish performance.