Analysis and evaluation of stemming algorithms: a case study with Assamese

Authors:
Navanath Saharia;Utpal Sharma;Jugal Kalita
Affiliations:
Tezpur University Napaam, India;Tezpur University Napaam, India;University of Colorado Colorado Springs
Venue:
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Year:
2012

Citing 7
Cited 1

Strength and similarity of affix removal stemming algorithms

ACM SIGIR Forum
An algorithm for the unsupervised learning of morphology

Natural Language Engineering
YASS: Yet another suffix stripper

ACM Transactions on Information Systems (TOIS)
Acquisition of Morphology of an Indic Language from Text Corpus

ACM Transactions on Asian Language Information Processing (TALIP)
An unsupervised Hindi stemmer with heuristic improvements

Proceedings of the second workshop on Analytics for noisy unstructured text data
Part of speech tagger for Assamese text

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
A Suffix-Based Noun and Verb Classifier for an Inflectional Language

IALP '10 Proceedings of the 2010 International Conference on Asian Language Processing

An improved stemming approach using HMM for a highly inflectional language

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stemming is the process of automatically extracting the base form of a given word of a language. Assamese is a morphologically rich, relatively free word order, Indo-Aryan language spoken in North-Eastern part of India that uses Assamese-Bengali script for writing. As it is among the less computationally studied languages, our aim is to extract stem from a given word. We adopt the suffix stripping approach along with a rule engine that generates all the possible suffix sequences. We found 82% accuracy with the suffix stripping approach after adding a root-word list of size 20,000 approximately.