Japanese word segmentation by hidden Markov model

  • Authors:
  • Constantine P. Papageorgiou

  • Affiliations:
  • BBN Systems and Technologies, Cambridge, MA

  • Venue:
  • HLT '94 Proceedings of the workshop on Human Language Technology
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

The processing of Japanese text is complicated by the fact that there are no word delimiters. To segment Japanese text, systems typically use knowledge-based methods and large lexicons. This paper presents a novel approach to Japanese word segmentation which avoids the need for Japanese word lexicons and explicit rule bases. The algorithm utilizes a hidden Markov model, a stochastic process, to determine word boundaries. This method has achieved 91% accuracy in segmenting words in a test corpus.