Automatic Stemming for Indexing of an Agglutinative Language

Authors:
Sehyeong Cho;Seung-Soo Han
Affiliations:
-;-
Venue:
ADVIS '02 Proceedings of the Second International Conference on Advances in Information Systems
Year:
2002

Citing 4
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
A Machine Learning Approach to POS Tagging

Machine Learning
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Knowledge-free induction of morphology using latent semantic analysis

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stemming is an essential process in information retrieval. Though there are extremely simple stemming algorithms for inflectional languages, the story goes totally different for agglutinative languages. It is even more difficult if significant portion of the vocabulary is new or unknown. This paper explores the possibility of stemming of an agglutinative language, in particular, Korean language, by unsupervised morphology learning. We use only raw corpus and make use of no dictionary. Unlike heuristic algorithms that are theoretically ungrounded, this method is based on statistical methods, which are widely accepted. Although the method is currently applied only to Korean language, the method can be adapted to other agglutinative languages with similar characteristics, since language-specific knowledge is not used.