Multilingual Information Retrieval Using Machine Translation, Relevance Feedback and Decompounding

  • Authors:
  • Aitao Chen;Fredric C. Gey

  • Affiliations:
  • School of Information Management and Systems, University of California at Berkeley, CA 94720-4600, USA. aitao@sims.berkeley.edu;UC Data Archive & Technical Assistance (UC DATA), University of California at Berkeley, CA 94720-5100, USA. gey@ucdata.berkeley.edu

  • Venue:
  • Information Retrieval
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multilingual retrieval (querying of multiple document collections each in a different language) can be achieved by combining several individual techniques which enhance retrieval: machine translation to cross the language barrier, relevance feedback to add words to the initial query, decompounding for languages with complex term structure, and data fusion to combine monolingual retrieval results from different languages. Using the CLEF 2001 and CLEF 2002 topics and document collections, this paper evaluates these techniques within the context of a monolingual document ranking formula based upon logistic regression. Each individual technique yields improved performance over runs which do not utilize that technique. Moreover the techniques are complementary, in that combining the best techniques outperforms individual technique performance. An approximate but fast document translation using bilingual wordlists created from machine translation systems is presented and evaluated. The fast document translation is as effective as query translation in multilingual retrieval. Furthermore, when fast document translation is combined with query translation in multilingual retrieval, the performance is significantly better than that of query translation or fast document translation.