Strength and similarity of affix removal stemming algorithms
ACM SIGIR Forum
Unsupervised learning of morphology for building lexicon for a highly inflectional language
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
YASS: Yet another suffix stripper
ACM Transactions on Information Systems (TOIS)
Acquisition of Morphology of an Indic Language from Text Corpus
ACM Transactions on Asian Language Information Processing (TALIP)
An unsupervised Hindi stemmer with heuristic improvements
Proceedings of the second workshop on Analytics for noisy unstructured text data
Towards an error-free Arabic stemming
Proceedings of the 2nd ACM workshop on Improving non english web searching
Induction of a simple morphology for highly-inflecting languages
SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
A Suffix-Based Noun and Verb Classifier for an Inflectional Language
IALP '10 Proceedings of the 2010 International Conference on Asian Language Processing
Analysis and evaluation of stemming algorithms: a case study with Assamese
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Hi-index | 0.00 |
Stemming is a common method for morphological normalization of natural language texts. Modern information retrieval systems rely on such normalization techniques for automatic document processing tasks. High quality stemming is difficult in highly inflectional Indic languages. Little research has been performed on designing algorithms for stemming of texts in Indic languages. In this study, we focus on the problem of stemming texts in Assamese, a low resource Indic language spoken in the North-Eastern part of India by approximately 30 million people. Stemming is hard in Assamese due to the common appearance of single letter suffixes as morphological inflections. More than 50% of the inflections in Assamese appear as single letter suffixes. Such single letter morphological inflections cause ambiguity when predicting underlying root word. Therefore, we propose a new method that combines a rule based algorithm for predicting multiple letter suffixes and an HMM based algorithm for predicting the single letter suffixes. The combined approach can predict morphologically inflected words with 92% accuracy.