Domain adaptation of statistical machine translation models with monolingual data for cross lingual information retrieval

  • Authors:
  • Vassilina Nikoulina;Stéphane Clinchant

  • Affiliations:
  • Xerox Research Center Europe, Meylan, France;Xerox Research Center Europe, Meylan, France

  • Venue:
  • ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistical Machine Translation (SMT) is often used as a black-box in CLIR tasks. We propose an adaptation method for an SMT model relying on the monolingual statistics that can be extracted from the document collection (both source and target if available). We evaluate our approach on CLEF Domain Specific task (German-English and English-German) and show that very simple document collection statistics integrated in SMT translation model allow to obtain good gains both in terms of IR metrics (MAP, P10) and MT evaluation metrics (BLEU, TER).