Analysis and evaluation of stemming algorithms: a case study with Assamese

  • Authors:
  • Navanath Saharia;Utpal Sharma;Jugal Kalita

  • Affiliations:
  • Tezpur University Napaam, India;Tezpur University Napaam, India;University of Colorado Colorado Springs

  • Venue:
  • Proceedings of the International Conference on Advances in Computing, Communications and Informatics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Stemming is the process of automatically extracting the base form of a given word of a language. Assamese is a morphologically rich, relatively free word order, Indo-Aryan language spoken in North-Eastern part of India that uses Assamese-Bengali script for writing. As it is among the less computationally studied languages, our aim is to extract stem from a given word. We adopt the suffix stripping approach along with a rule engine that generates all the possible suffix sequences. We found 82% accuracy with the suffix stripping approach after adding a root-word list of size 20,000 approximately.