Compound noun segmentation based on lexical data extracted from corpus

  • Authors:
  • Juntae Yoon

  • Affiliations:
  • IRCS, University of Pennsylvania, 3401 Walnut St., Suite 400A, Philadelphia, PA 19104-6228, USA/ e-mail: jtyoon@linc.cis.upenn.edu

  • Venue:
  • Natural Language Engineering
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Compound noun segmentation is one of the crucial problems in Korean language processing because a series of nouns in Korean may appear without space in real text, which makes it difficult to identify its morphological constituents. This paper presents an effective method of Korean compound noun segmentation based on lexical data extracted from a corpus. The segmentation consists of two tasks: First, it uses a Hand-Build Segmentation Dictionary (HBSD) to segment compound nouns which frequently occur or need an exceptional process. Second, a segmentation algorithm using data from a corpus is proposed, where simple nouns and their frequencies are stored in a Simple Noun Dictionary (SND) for segmentation. The analysis is executed based on modified tabular parsing using min-max operation. Our experiments have shown a very effective accuracy rate of about 97.29%, which turns out to be very effective.