Machine translation without words through substring alignment

  • Authors:
  • Graham Neubig;Taro Watanabe;Shinsuke Mori;Tatsuya Kawahara

  • Affiliations:
  • Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto, Japan and National Institute of Information and Communication Technology, Hikari-dai, Seika-cho, Soraku-gun, Kyoto, Japan;National Institute of Information and Communication Technology, Hikari-dai, Seika-cho, Soraku-gun, Kyoto, Japan;Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto, Japan;Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto, Japan

  • Venue:
  • ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we demonstrate that accurate machine translation is possible without the concept of "words," treating MT as a problem of transformation between character strings. We achieve this result by applying phrasal inversion transduction grammar alignment techniques to character strings to train a character-based translation model, and using this in the phrase-based MT framework. We also propose a look-ahead parsing algorithm and substring-informed prior probabilities to achieve more effective and efficient alignment. In an evaluation, we demonstrate that character-based translation can achieve results that compare to word-based systems while effectively translating unknown and uncommon words over several language pairs.