An Amharic stemmer: reducing words to their citation forms

Authors:
Atelach Alemu Argaw;Lars Asker
Affiliations:
Stockholm University/KTH, Sweden;Stockholm University/KTH, Sweden
Venue:
Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Year:
2007

Citing 1
Cited 3

Part of speech tagging for Amharic using conditional random fields

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

Classifying Amharic webnews

Information Retrieval
Methods for Amharic part-of-speech tagging

AfLaT '09 Proceedings of the First Workshop on Language Technologies for African Languages
Introduction to the special issue on African Language Technology

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stemming is an important analysis step in a number of areas such as natural language processing (NLP), information retrieval (IR), machine translation(MT) and text classification. In this paper we present the development of a stemmer for Amharic that reduces words to their citation forms. Amharic is a Semitic language with rich and complex morphology. The application of such a stemmer is in dictionary based cross language IR, where there is a need in the translation step, to look up terms in a machine readable dictionary (MRD). We apply a rule based approach supplemented by occurrence statistics of words in a MRD and in a 3.1M words news corpus. The main purpose of the statistical supplements is to resolve ambiguity between alternative segmentations. The stemmer is evaluated on Amharic text from two domains, news articles and a classic fiction text. It is shown to have an accuracy of 60% for the old fashioned fiction text and 75% for the news articles.