Automatic extraction of bilingual word pairs using inductive chain learning in various languages

  • Authors:
  • Hiroshi Echizen-ya;Kenji Araki;Yoshio Momouchi

  • Affiliations:
  • Department of Electronics and Information Engineering, Hokkai-Gakuen University, Chuo-ku, Sapporo, Japan;Graduate School of Information Science and Technology, Hokkaido University, Kita-ku, Sapporo, Japan;Department of Electronics and Information Engineering, Hokkai-Gakuen University, Chuo-ku, Sapporo, Japan

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a new learning method for extracting bilingual word pairs from parallel corpora in various languages. In cross-language information retrieval, the system must deal with various languages. Therefore, automatic extraction of bilingual word pairs from parallel corpora with various languages is important. However, previous works based on statistical methods are insufficient because of the sparse data problem. Our learning method automatically acquires rules, which are effective to solve the sparse data problem, only from parallel corpora without any prior preparation of a bilingual resource (e.g., a bilingual dictionary, a machine translation system). We call this learning method Inductive Chain Learning (ICL). Moreover, the system using ICL can extract bilingual word pairs even from bilingual sentence pairs for which the grammatical structures of the source language differ from the grammatical structures of the target language because the acquired rules have the information to cope with the different word orders of source language and target language in local parts of bilingual sentence pairs. Evaluation experiments demonstrated that the recalls of systems based on several statistical approaches were improved through the use of ICL.