Building a bilingual dictionary from a Japanese-Chinese patent corpus

  • Authors:
  • Keiji Yasuda;Eiichiro Sumita

  • Affiliations:
  • National Institute of Information and Communications Technology, Keihanna Science City, Kyoto, Japan;National Institute of Information and Communications Technology, Keihanna Science City, Kyoto, Japan

  • Venue:
  • CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose an automatic method to build a bilingual dictionary from a Japanese-Chinese parallel corpus. The proposed method uses character similarity between Japanese and Chinese, and a statistical machine translation (SMT) framework in a cascading manner. The first step extracts word translation pairs from the parallel corpus based on similarity between Japanese kanji characters (Chinese characters used in Japanese writing) and simplified Chinese characters. The second step trains phrase tables using 2 different SMT training tools, then extracts common word translation pairs. The third step trains an SMT system using the word translation pairs obtained by the first and the second steps. According to the experimental results, the proposed method yields 59.3% to 92.1% accuracy in the word translation pairs extracted, depending on the cascading step.