Indonesian-Japanese CLIR using only limited resource

  • Authors:
  • Ayu Purwarianti;Masatoshi Tsuchiya;Seiichi Nakagawa

  • Affiliations:
  • Toyohashi University of Technology;Toyohashi University of Technology;Toyohashi University of Technology

  • Venue:
  • CLIIR '06 Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval?
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Our research aim here is to build a CLIR system that works for a language pair with poor resources where the source language (e.g. Indonesian) has limited language resources. Our Indonesian-Japanese CLIR system employs the existing Japanese IR system, and we focus our research on the Indonesian-Japanese query translation. There are two problems in our limited resource query translation: the OOV problem and the translation ambiguity. The OOV problem is handled using target language's resources (English-Japanese dictionary and Japanese proper name dictionary). The translation ambiguity is handled using a Japanese monolingual corpus in our translation filtering. We select the final translation set using the mutual information score and the TFxIDF score. The result on NTCIR 3 (NII-NACSIS Test Collection for IR Systems) Web Retrieval Task shows that the translation method achieved a higher IR score than the transitive machine translation (using Kataku (Indonesian-English) and Babelfish/ Excite (English-Japanese) engine) result. The best result achieved about 49% of the monolingual retrieval.