Selecting automatically the best query translations

Authors:
Pierre-Yves Berger;Jacques Savoy
Affiliations:
University of Neuchatel, Neuchatel, Switzerland;University of Neuchatel, Neuchatel, Switzerland
Venue:
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Year:
2007

Citing 23
Cited 0

Application of loglinear models to informetric phenomena

Information Processing and Management: an International Journal - Special issue on Informetrics
Inferring probability of relevance using the method of logistic regression

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical inference in retrieval effectiveness evaluation

Information Processing and Management: an International Journal
Support for interactive document selection in cross-language information retrieval

Information Processing and Management: an International Journal - Special issue on progress toward digital libraries
Experimentation as a way of life: Okapi at TREC

Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Database merging strategy based on logistic regression

Information Processing and Management: an International Journal
Things a Computer Scientist Rarely Talks About

Things a Computer Scientist Rarely Talks About
Cross-Language Information Retrieval

Cross-Language Information Retrieval
Predicting query performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments with the Eurospider Retrieval System for CLEF 2001

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Combining Multiple Strategies for Effective Monolingual and Cross-Language Retrieval

Information Retrieval
Character N-Gram Tokenization for European Language Text Retrieval

Information Retrieval
Multilingual Information Retrieval Using Machine Translation, Relevance Feedback and Decompounding

Information Retrieval
Combination Approaches for Multilingual Text Retrieval

Information Retrieval
How Effective is Stemming and Decompounding for German Text Retrieval?

Information Retrieval
Observing users, designing clarity: a case study on the user-centered design of a cross-language information retrieval system

Journal of the American Society for Information Science and Technology
Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Comparative study of monolingual and multilingual search models for use with asian languages

ACM Transactions on Asian Language Information Processing (TALIP)
The TREC 2005 robust track

ACM SIGIR Forum
A study of statistical models for query translation: finding a good unit of translation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Accessing Multilingual Information Repositories: 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005,Vienna, Austria, 21-23 September, 2005, ... Papers (Lecture Notes in Computer Science)

Accessing Multilingual Information Repositories: 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005,Vienna, Austria, 21-23 September, 2005, ... Papers (Lecture Notes in Computer Science)
Modern Applied Statistics with S

Modern Applied Statistics with S
Comparative evaluation of cross-language information retrieval systems

From Integrated Publication and Information Systems to Virtual Information and Knowledge Environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to search corpora written in two or more languages, the simplest and most efficient approach is to translate the query submitted into the required language(s). To achieve this goal, we developed an IR model based on translation tools freely available on the Web (bilingual machine-readable dictionaries, machine translation systems). When comparing the retrieval effectiveness of manually and automatically translated queries, we found that manual translation outperformed machine-based approaches, yet performance differences varied from one language to the text. Moreover, when analyzing query-by-query performances, we found that query performances based on machine-based translations varied a great deal. We then wondered whether or not we could predict the retrieval performance of a translated query and apply this knowledge to select the best translation(s). To do so we designed and evaluated a predictive system based on logistic regression and then used it to select the top most appropriate machine-based translations. Using a set of 99 queries and a documents collection available in the German and Spanish languages (extracted from the CLEF-2001 and 2002 test suites), we show that the retrieval performance of the suggested query translation selection procedure is statistically better than the single best MT system, but still inferior to the retrieval performances resulting from manual translations.