Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

  • Authors:
  • Fatiha Sadat;Masatoshi Yoshikawa;Shunsuke Uemura

  • Affiliations:
  • Nara Institute of Science and Technology, Ikoma, Nara, Japan;Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan;Nara Institute of Science and Technology, Ikoma, Nara, Japan

  • Venue:
  • Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an approach to bilingual lexicon extraction from comparable corpora and evaluations on Cross-Language Information Retrieval. We explore a bi-directional extraction of bilingual terminology primarily from comparable corpora. A combined statistics-based and linguistics-based model to select best translation candidates to phrasal translation is proposed. Evaluations using a large test collection for Japanese-English revealed the proposed combination of bi-directional comparable corpora, bilingual dictionaries and transliteration, augmented with linguistics-based pruning to be highly effective in Cross-Language Information Retrieval.