Learning-based Intrasentence Segmentation for Efficient Translation of Long Sentences

Authors:
Sung-Dong Kim;Byoung-Tak Zhang;Yung Taek Kim
Affiliations:
Department of Computer Engineering, Hansung University, Samsun-dong Sungbuk-gu, Seoul, Korea E-mail: sdkim@hansung.ac.kr;Department of Computer Engineering, Hansung University, Samsun-dong Sungbuk-gu, Seoul, Korea E-mail: btzhang@comp.snu.ac.kr;Department of Computer Engineering, Hansung University, Samsun-dong Sungbuk-gu, Seoul, Korea E-mail: ytkim@comp.snu.ac.kr
Venue:
Machine Translation
Year:
2001

Citing 17
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Artificial intelligence: theory and practice

Artificial intelligence: theory and practice
Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
Evolution and Optimum Seeking: The Sixth Generation

Evolution and Optimum Seeking: The Sixth Generation
Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems

Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems
Machine Learning

Machine Learning
Induction of Decision Trees

Machine Learning
Version spaces: an approach to concept learning.

Version spaces: an approach to concept learning.
Discourse segmentation by human and automated means

Computational Linguistics
Adaptive multilingual sentence boundary disambiguation

Computational Linguistics
A maximum entropy approach to identifying sentence boundaries

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Partial parsing via finite-state cascades

Natural Language Engineering
A fast partial parse of natural language sentences using a connectionist method

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Robust parsing based on discourse information: completing partial parses of ill-formed sentences on the basis of discourse information

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Parsing long English sentences with pattern rules

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
A matching technique in Example-Based Machine Translation

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Morphological analysis and synthesis by automated discovery and acquisition of linguistic rules

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2

Quantified Score

Hi-index	0.01

Visualization

Abstract

Long-sentence analysis has been a critical problem in machine translation becauseof its high complexity. Intrasentence segmentation has been proposed as a methodfor reducing parsing complexity. This paper presents a two-step segmentation method:(1) identifying potential segmentation positions in a sentence and (2) selecting an actualsegmentation position amongst them. We have attempted to apply machine learningtechniques to the segmentation task: ``concept learning'' and ``genetic learning''. Bylearning the ``SegmentablePosition'' concept, the rules for identifying potentialsegmentation positions are postulated. The selection of the actual segmentationposition is based on a function whose parameters are determined by genetic learning.Experimental results are presented which illustrate the effectiveness of our approachto long-sentence parsing for MT. The results also show improved segmentationperformance in comparison to other existing methods.