Category-pattern-based korean word-spacing

Authors:
Mi-young Kang;Sung-won Jung;Hyuk-chul Kwon
Affiliations:
Korean Language Processing Laboratory, Department of Computer Science Engineering, Pusan National University;Korean Language Processing Laboratory, Department of Computer Science Engineering, Pusan National University;Korean Language Processing Laboratory, Department of Computer Science Engineering, Pusan National University
Venue:
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Year:
2006

Citing 3
Cited 0

A stochastic finite-state word-segmentation algorithm for Chinese

Computational Linguistics
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Combined word-spacing method for disambiguating korean texts

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is difficult to cope with data sparseness, unless augmenting the size of the dictionary in a stochastic-based word-spacing model is an option. To resolve both data sparseness and the dictionary memory size problem, this paper describes the process of dynamically providing candidate words to detect correct words using morpheme unigrams and their categories. Each candidate word's probability was estimated from the morpheme probability, which was weighted according to its category. The category weights were trained to minimize the mean of the errors between the observed probability of a word and that estimated by the word's individual morpheme probability weighted by its category power in a category pattern for producing the given word.