Multilingual Information Retrieval Based on Document Alignment Techniques

  • Authors:
  • Martin Braschler;Peter Schäuble

  • Affiliations:
  • -;-

  • Venue:
  • ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

A multilingual information retrieval method is presented where the user formulates the query in his/her preferred language to retrieve relevant information from a multilingual document collection. This multilingual retrieval method involves mono- and cross-language searches as well as merging their results. We adopt a corpus based approach where documents of different languages are associated if they cover a similar story. The resulting comparable corpus enables two novel techniques we have developed. First, it enables Cross-Language Information Retrieval (CLIR) which does not lack vocabulary coverage as we observed in the case of approaches that are based on automatic Machine Translation (MT). Second, aligned documents of this corpus facilitate to merge the results of mono- and cross-language searches. Using the TREC CLIR data, excellent results are obtained. In addition, our evaluation of the document alignments gives us new insights about the usefulness of comparable corpora.