Translating unknown queries with web corpora for cross-language information retrieval

  • Authors:
  • Pu-Jen Cheng;Jei-Wen Teng;Ruei-Cheng Chen;Jenq-Haur Wang;Wen-Hsiang Lu;Lee-Feng Chien

  • Affiliations:
  • Academia Sinica, Taiwan;Academia Sinica, Taiwan;Academia Sinica, Taiwan;Academia Sinica, Taiwan;National Cheng Kung University, Taiwan;Academia Sinica, Taiwan and National Taiwan University, Taiwan

  • Venue:
  • Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is crucial for cross-language information retrieval (CLIR) systems to deal with the translation of unknown queries due to that real queries might be short. The purpose of this paper is to investigate the feasibility of exploiting the Web as the corpus source to translate unknown queries for CLIR. We propose an online translation approach to determine effective translations for unknown query terms via mining of bilingual search-result pages obtained from Web search engines. This approach can alleviate the problem of the lack of large bilingual corpora, translate many unknown query terms, provide flexible query specifications, and extract semantically-close translations to benefit CLIR tasks -- especially for cross-language Web search.