Transliteration using a phrase-based statistical machine translation system to re-score the output of a joint multigram model

  • Authors:
  • Andrew Finch;Eiichiro Sumita

  • Affiliations:
  • Keihanna Science City, Japan;Keihanna Science City, Japan

  • Venue:
  • NEWS '10 Proceedings of the 2010 Named Entities Workshop
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The system presented in this paper uses a combination of two techniques to directly transliterate from grapheme to grapheme. The technique makes no language specific assumptions, uses no dictionaries or explicit phonetic information; the process transforms sequences of tokens in the source language directly into to sequences of tokens in the target. All the language pairs in our experiments were transliterated by applying this technique in a single unified manner. The approach we take is that of hypothesis rescoring to integrate the models of two state-of-the-art techniques: phrase-based statistical machine translation (SMT), and a joint multigram model. The joint multigram model was used to generate an n-best list of transliteration hypotheses that were re-scored using the models of the phrase-based SMT system. The both of the models' scores for each hypothesis were linearly interpolated to produce a final hypothesis score that was used to re-rank the hypotheses. In our experiments on development data, the combined system was able to outperform both of its component systems substantially.