Cross-lingual relevance models

  • Authors:
  • Victor Lavrenko;Martin Choquette;W. Bruce Croft

  • Affiliations:
  • Center for Intelligent Information Retrieval, University of Massachusetts, Amherst, MA;Center for Intelligent Information Retrieval, University of Massachusetts, Amherst, MA;Center for Intelligent Information Retrieval, University of Massachusetts, Amherst, MA

  • Venue:
  • SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

We propose a formal model of Cross-Language Information Retrieval that does not rely on either query translation or document translation. Our approach leverages recent advances in language modeling to directly estimate an accurate topic model in the target language, starting with a query in the source language. The model integrates popular techniques of disambiguation and query expansion in a unified formal framework. We describe how the topic model can be estimated with either a parallel corpus or a dictionary. We test the framework by constructing Chinese topic models from English queries and using them in the CLIR task of TREC9. The model achieves performance around 95% of the strong mono-lingual baseline in terms of average precision. In initial precision, our model outperforms the mono-lingual baseline by 20%. The main contribution of this work is the unified formal model which integrates techniques that are essential for effective Cross-Language Retrieval.