Non-compositional language model and pattern dictionary development for Japanese compound and complex sentences

  • Authors:
  • Satoru Ikehara;Masato Tokuhisa;Jin'ichi Murakami

  • Affiliations:
  • Tottori University, Koyama-Minami Tottori, Japan;Tottori University, Koyama-Minami Tottori, Japan;Tottori University, Koyama-Minami Tottori, Japan

  • Venue:
  • COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

To realize high quality machine translation, we proposed a Non-Compositional Language Model, and developed a sentence pattern dictionary of 226,800 pattern pairs for Japanese compound and complex sentences consisting of 2 or 3 clauses. In pattern generation from a parallel corpus, Compositional Constituents that could be generalized were 74% of independent words, 24% of phrases and only 15% of clauses. This means that in Japanese-to-English MT, most of the translation results as shown in the parallel corpus could not be obtained by methods based on Compositional Semantics. This dictionary achieved a syntactic coverage of 98% and a semantic coverage of 78%. It will substantially improve translation quality.