Foundations of statistical natural language processing
Foundations of statistical natural language processing
A Machine Learning Approach to POS Tagging
Machine Learning
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Knowledge-free induction of morphology using latent semantic analysis
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Hi-index | 0.00 |
Stemming is an essential process in information retrieval. Though there are extremely simple stemming algorithms for inflectional languages, the story goes totally different for agglutinative languages. It is even more difficult if significant portion of the vocabulary is new or unknown. This paper explores the possibility of stemming of an agglutinative language, in particular, Korean language, by unsupervised morphology learning. We use only raw corpus and make use of no dictionary. Unlike heuristic algorithms that are theoretically ungrounded, this method is based on statistical methods, which are widely accepted. Although the method is currently applied only to Korean language, the method can be adapted to other agglutinative languages with similar characteristics, since language-specific knowledge is not used.