Applying mpaligner to machine transliteration with Japanese-specific heuristics

  • Authors:
  • Yoh Okuno

  • Affiliations:
  • Job Hunter

  • Venue:
  • NEWS '12 Proceedings of the 4th Named Entity Workshop
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We developed a machine transliteration system combining mpaligner (an improvement of m2m-aligner), DirecTL+, and some Japanese-specific heuristics for the purpose of NEWS 2012. Our results show that mpaligner is greatly better than m2m-aligner, and the Japanese-specific heuristics are effective for JnJk and EnJa tasks. While m2m-aligner is not good at long alignment, mpaligner performs well at longer alignment without any length limit. In JnJk and EnJa tasks, it is crucial to handle long alignment. An experimental result revealed that de-romanization, which is reverse operation of romanization, is crucial for JnJk task. In EnJa task, it is shown that mora is the best alignment unit for Japanese language.