Intra-sentence segmentation based on support vector machines in English-Korean machine translation systems

Authors:
Yu-Seop Kim;Yu-Jin Oh
Affiliations:
Department of Computer Engineering, Hallym University, 39 Hallymdaehak-gil, Chuncheon, Gangwon-do 200-702, Republic of Korea;Department of Economics, University of Seoul, 90 Cheonong Dong, Dongdaemoon Gu, Seoul 130-743, Republic of Korea
Venue:
Expert Systems with Applications: An International Journal
Year:
2008

Citing 9
Cited 3

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
The nature of statistical learning theory

The nature of statistical learning theory
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Periods, capitalized words, etc.

Computational Linguistics
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Adaptive multilingual sentence boundary disambiguation

Computational Linguistics
A maximum entropy approach to identifying sentence boundaries

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Parsing long English sentences with pattern rules

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Learning from Data: Concepts, Theory, and Methods

Learning from Data: Concepts, Theory, and Methods

Web-based pattern learning for named entity translation in Korean-Chinese cross-language information retrieval

Expert Systems with Applications: An International Journal
Learning weights for translation candidates in Japanese-Chinese information retrieval

Expert Systems with Applications: An International Journal
Systematic processing of long sentences in rule based portuguese-chinese machine translation

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	12.05

Visualization

Abstract

This work is about intra-sentence segmentation performed before syntactic analysis of long sentences composed of at least 20 words in an English-Korean machine translation system. A long sentence has been known to spend enormous computational time and space when it is analyzed syntactically. It can also produce poor translation results. To resolve this problem, we partitioned a long sentence into a few segments to analyze each segment separately. To partition the sentence, firstly, we tried to find candidates for each segment position in the sentence. We then generated input vectors representing lexical contexts of the corresponding candidates and also used the support vector machines (SVM) algorithm to learn and recognize the appropriate segment positions. We used three kernel functions, the linear kernel, the polynomial kernel and the Gaussian kernel, to find optimal hyperplanes classifying proper positions and we compared results obtained from each kernel function. As a result of the experiments, we acquired 0.81, 0.83, and 0.79 f-measure values from the linear, polynomial and Gaussian kernel, respectively.