Multilingual Information Retrieval Based on Parallel Texts from the Web

Authors:
Jian-Yun Nie;Michel Simard;George Foster
Affiliations:
-;-;-
Venue:
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Year:
2000

Citing 3
Cited 8

Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II

Translation Resources, Merging Strategies, and Relevance Feedback for Cross-Language Information Retrieval

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Cross-Language Evaluation Forum: Objectives, Results, Achievements

Information Retrieval
Combining Multiple Strategies for Effective Monolingual and Cross-Language Retrieval

Information Retrieval
Character N-Gram Tokenization for European Language Text Retrieval

Information Retrieval
Automatic association of web directories with word senses

Computational Linguistics - Special issue on web as corpus
Technical issues of cross-language information retrieval: a review

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Improving query translation with confidence estimation for cross language information retrieval

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Combining resources with confidence measures for cross language information retrieval

Proceedings of the ACM first Ph.D. workshop in CIKM

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe our approach in CLEF Cross-Language IR (CLIR) tasks. In our experiments, we used statistical translation models for query translation. Some of the models are trained on parallel web pages that are automatically mined from the Web. Others are trained from bilingual dictionaries and lexical databases. These models are combined in query translation. Our goal in this series of experiments is to test if the parallel web pages can be used effectively to translate queries in multilingual IR. In particular, we compare models trained on Web documents with models that also combine other resources such as dictionaries. Our results show that the models trained on the parallel web pages can achieve reasonable CLIR performance. However, combining models effectively is a difficult task, and single models still yield better results.