Cross-lingual random indexing for information retrieval

  • Authors:
  • Hans Moen;Erwin Marsi

  • Affiliations:
  • Dept. of Computer and Information Science, Norwegian University of Science and Technology (NTNU), Trondheim, Norway;Dept. of Computer and Information Science, Norwegian University of Science and Technology (NTNU), Trondheim, Norway

  • Venue:
  • SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cross-lingual information retrieval aims at retrieving relevant documents from a document collection in a language different from the query language. A novel method is proposed which avoids direct translation of queries by implicit encoding of translations in a bilingual vector space model (VSM). Both queries and documents are represented as vectors using an extension of random indexing (RI). As work on RI for information retrieval is limited, it is first evaluated for monolingual retrieval. Two variants are tested: (1) a direct RI model that approximates a standard VSM; (2) an indirect RI model intended to capture latent semantic relations among terms with a sliding window procedure. Next cross-lingual extensions of these models are presented and evaluated for cross-lingual document retrieval.