Towards effective strategies for monolingual and bilingual information retrieval: Lessons learned from NTCIR-4

  • Authors:
  • Yan Qu;David A. Hull;Gregory Grefenstette;David A. Evans;Motoko Ishikawa;Setsuko Nara;Toshiya Ueda;Daisuke Noda;Kousaku Arita;Yuki Funakoshi;Hiroshi Matsuda

  • Affiliations:
  • Clairvoyance Corporation, Pittsburgh, PA;Clairvoyance Corporation, Pittsburgh, PA;Clairvoyance Corporation, Pittsburgh, PA;Clairvoyance Corporation, Pittsburgh, PA;Justsystem Corporation, Tokushima-city, Japan;Justsystem Corporation, Tokushima-city, Japan;Justsystem Corporation, Tokushima-city, Japan;Justsystem Corporation, Tokushima-city, Japan;Justsystem Corporation, Tokushima-city, Japan;Justsystem Corporation, Tokushima-city, Japan;Justsystem Corporation, Tokushima-city, Japan

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

At the NTCIR-4 workshop, Justsystem Corporation (JSC) and Clairvoyance Corporation (CC) collaborated in the cross-language retrieval task (CLIR). Our goal was to evaluate the performance and robustness of our recently developed commercial-grade CLIR systems for English and Asian languages. The main contribution of this article is the investigation of different strategies, their interactions in both monolingual and bilingual retrieval tasks, and their respective contributions to operational retrieval systems in the context of NTCIR-4. We report results of Japanese and English monolingual retrieval and results of Japanese-to-English bilingual retrieval. In monolingual retrieval analysis, we examine two special properties of the NTCIR experimental design (two levels of relevance and identical queries in multiple languages) and explore how they interact with strategies of our retrieval system, including pseudo-relevance feedback, multi-word term down-weighting, and term weight merging strategies. Our analysis shows that the choice of language (English or Japanese) does not have a significant impact on retrieval performance. Query expansion is slightly more effective with relaxed judgments than with rigid judgments. For better retrieval performance, weights of multi-word terms should be lowered. In the bilingual retrieval analysis, we aim to identify robust strategies that are effective when used alone and when used in combination with other strategies. We examine cross-lingual specific strategies such as translation disambiguation and translation structuring, as well as general strategies such as pseudo-relevance feedback and multi-word term down-weighting. For shorter title topics, pseudo-relevance feedback is a major performance enhancer, but translation structuring affects retrieval performance negatively when used alone or in combination with other strategies. All experimented strategies improve retrieval performance for the longer description topics, with pseudo-relevance feedback and translation structuring as the major contributors.