Foundations of statistical natural language processing
Foundations of statistical natural language processing
A compression-based algorithm for Chinese word segmentation
Computational Linguistics
Segmenting a sentence into morphemes using statistic information between words
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Improving partial parsing based on error-pattern analysis for a Korean grammar-checker
ACM Transactions on Asian Language Information Processing (TALIP)
A hybrid approach to automatic word-spacing in Korean
IEA/AIE'2004 Proceedings of the 17th international conference on Innovations in applied artificial intelligence
Category-pattern-based korean word-spacing
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Hi-index | 0.00 |
In this paper, we propose an automatic word-spacing method for a Korean text preprocessing system in resolving the problem of context-dependent word-spacing The current method combines the stochastic-based method and partial parsing First, the stochastic method splits an input sentence into a candidate-word sequence using word unigrams and syllable bigrams Second, the system engages a partial parsing module based on the asymmetric relation between the candidate-words The partial parsing module manages the governing relationship using words which are incorporated into the knowledge base as having a high-probability of spacing-error words These elements serve as parsing trigger points based on their linguistic information, and they deter-mine the parsing direction as well as the parsing scope Combining the stochastic- and linguistic-based methods, the current automatic word-spacing system becomes robust against the problem of context-dependant word-spacing An average 8.98% amelioration of the total error rate is obtained for inner and external data.