Strength and similarity of affix removal stemming algorithms
ACM SIGIR Forum
An algorithm for the unsupervised learning of morphology
Natural Language Engineering
YASS: Yet another suffix stripper
ACM Transactions on Information Systems (TOIS)
Acquisition of Morphology of an Indic Language from Text Corpus
ACM Transactions on Asian Language Information Processing (TALIP)
An unsupervised Hindi stemmer with heuristic improvements
Proceedings of the second workshop on Analytics for noisy unstructured text data
Part of speech tagger for Assamese text
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
A Suffix-Based Noun and Verb Classifier for an Inflectional Language
IALP '10 Proceedings of the 2010 International Conference on Asian Language Processing
An improved stemming approach using HMM for a highly inflectional language
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.00 |
Stemming is the process of automatically extracting the base form of a given word of a language. Assamese is a morphologically rich, relatively free word order, Indo-Aryan language spoken in North-Eastern part of India that uses Assamese-Bengali script for writing. As it is among the less computationally studied languages, our aim is to extract stem from a given word. We adopt the suffix stripping approach along with a rule engine that generates all the possible suffix sequences. We found 82% accuracy with the suffix stripping approach after adding a root-word list of size 20,000 approximately.