A part of speech estimation method for Japanese unknown words using a statistical model of morphology and context

  • Authors:
  • Masaaki Nagata

  • Affiliations:
  • NTT Cyber Space Laboratories, Kanagawa, Japan

  • Venue:
  • ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a statistical model of Japanese unknown words consisting of a set of length and spelling models classified by the character types that constitute a word. The point is quite simple: different character sets should be treated differently and the changes between character types are very important because Japanese script has both ideograms like Chinese (kanji) and phonograms like English (katakana). Both word segmentation accuracy and part of speech tagging accuracy are improved by the proposed model. The model can achieve 96.6% tagging accuracy if unknown words are correctly segmented.