Selecting automatically the best query translations

  • Authors:
  • Pierre-Yves Berger;Jacques Savoy

  • Affiliations:
  • University of Neuchatel, Neuchatel, Switzerland;University of Neuchatel, Neuchatel, Switzerland

  • Venue:
  • Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In order to search corpora written in two or more languages, the simplest and most efficient approach is to translate the query submitted into the required language(s). To achieve this goal, we developed an IR model based on translation tools freely available on the Web (bilingual machine-readable dictionaries, machine translation systems). When comparing the retrieval effectiveness of manually and automatically translated queries, we found that manual translation outperformed machine-based approaches, yet performance differences varied from one language to the text. Moreover, when analyzing query-by-query performances, we found that query performances based on machine-based translations varied a great deal. We then wondered whether or not we could predict the retrieval performance of a translated query and apply this knowledge to select the best translation(s). To do so we designed and evaluated a predictive system based on logistic regression and then used it to select the top most appropriate machine-based translations. Using a set of 99 queries and a documents collection available in the German and Spanish languages (extracted from the CLEF-2001 and 2002 test suites), we show that the retrieval performance of the suggested query translation selection procedure is statistically better than the single best MT system, but still inferior to the retrieval performances resulting from manual translations.