Dictionary-Based Cross-Language Information Retrieval: Learning Experiences from CLEF 2000–2002

  • Authors:
  • Turid Hedlund;Eija Airio;Heikki Keskustalo;Raija Lehtokangas;Ari Pirkola;Kalervo Järvelin

  • Affiliations:
  • Department of Information Studies, University of Tampere, Finland. turid.hedlund@shh.fi;Department of Information Studies, University of Tampere, Finland. eija.airio@uta.fi;Department of Information Studies, University of Tampere, Finland. heikki.keskustalo@uta.fi;Department of Information Studies, University of Tampere, Finland. raija.lehtokangas@uta.fi;Department of Information Studies, University of Tampere, Finland. pirkola@tukki.jyu.fi;Department of Information Studies, University of Tampere, Finland. kalervo.jarvelin@uta.fi

  • Venue:
  • Information Retrieval
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this study the basic framework and performance analysis results are presented for the three year long development process of the dictionary-based UTACLIR system. The tests expand from bilingual CLIR for three language pairs Swedish, Finnish and German to English, to six language pairs, from English to French, German, Spanish, Italian, Dutch and Finnish, and from bilingual to multilingual. In addition, transitive translation tests are reported. The development process of the UTACLIR query translation system will be regarded from the point of view of a learning process. The contribution of the individual components, the effectiveness of compound handling, proper name matching and structuring of queries are analyzed. The results and the fault analysis have been valuable in the development process. Overall the results indicate that the process is robust and can be extended to other languages. The individual effects of the different components are in general positive. However, performance also depends on the topic set and the number of compounds and proper names in the topic, and to some extent on the source and target language. The dictionaries used affect the performance significantly.