Learning-based Intrasentence Segmentation for Efficient Translation of Long Sentences

  • Authors:
  • Sung-Dong Kim;Byoung-Tak Zhang;Yung Taek Kim

  • Affiliations:
  • Department of Computer Engineering, Hansung University, Samsun-dong Sungbuk-gu, Seoul, Korea E-mail: sdkim@hansung.ac.kr;Department of Computer Engineering, Hansung University, Samsun-dong Sungbuk-gu, Seoul, Korea E-mail: btzhang@comp.snu.ac.kr;Department of Computer Engineering, Hansung University, Samsun-dong Sungbuk-gu, Seoul, Korea E-mail: ytkim@comp.snu.ac.kr

  • Venue:
  • Machine Translation
  • Year:
  • 2001

Quantified Score

Hi-index 0.01

Visualization

Abstract

Long-sentence analysis has been a critical problem in machine translation becauseof its high complexity. Intrasentence segmentation has been proposed as a methodfor reducing parsing complexity. This paper presents a two-step segmentation method:(1) identifying potential segmentation positions in a sentence and (2) selecting an actualsegmentation position amongst them. We have attempted to apply machine learningtechniques to the segmentation task: ``concept learning'' and ``genetic learning''. Bylearning the ``SegmentablePosition'' concept, the rules for identifying potentialsegmentation positions are postulated. The selection of the actual segmentationposition is based on a function whose parameters are determined by genetic learning.Experimental results are presented which illustrate the effectiveness of our approachto long-sentence parsing for MT. The results also show improved segmentationperformance in comparison to other existing methods.