Combining a two-step conditional random field model and a joint source channel model for machine transliteration

  • Authors:
  • Dong Yang;Paul Dixon;Yi-Cheng Pan;Tasuku Oonishi;Masanobu Nakamura;Sadaoki Furui

  • Affiliations:
  • Tokyo Institute of Techonology;Tokyo Institute of Techonology;Tokyo Institute of Techonology;Tokyo Institute of Techonology;Tokyo Institute of Techonology;Tokyo Institute of Techonology

  • Venue:
  • NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes our system for "NEWS 2009 Machine Transliteration Shared Task" (NEWS 2009). We only participated in the standard run, which is a direct orthographical mapping (DOP) between two languages without using any intermediate phonemic mapping. We propose a new two-step conditional random field (CRF) model for DOP machine transliteration, in which the first CRF segments a source word into chunks and the second CRF maps the chunks to a word in the target language. The two-step CRF model obtains a slightly lower top-1 accuracy when compared to a state-of-the-art n-gram joint source-channel model. The combination of the CRF model with the joint source-channel leads to improvements in all the tasks. The official result of our system in the NEWS 2009 shared task confirms the effectiveness of our system; where we achieved 0.627 top-1 accuracy for Japanese transliterated to Japanese Kanji(JJ), 0.713 for English-to-Chinese(E2C) and 0.510 for English-to-Japanese Katakana(E2J).