Part of speech tagging for Amharic using conditional random fields
Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Information Retrieval
Methods for Amharic part-of-speech tagging
AfLaT '09 Proceedings of the First Workshop on Language Technologies for African Languages
Introduction to the special issue on African Language Technology
Language Resources and Evaluation
Hi-index | 0.00 |
Stemming is an important analysis step in a number of areas such as natural language processing (NLP), information retrieval (IR), machine translation(MT) and text classification. In this paper we present the development of a stemmer for Amharic that reduces words to their citation forms. Amharic is a Semitic language with rich and complex morphology. The application of such a stemmer is in dictionary based cross language IR, where there is a need in the translation step, to look up terms in a machine readable dictionary (MRD). We apply a rule based approach supplemented by occurrence statistics of words in a MRD and in a 3.1M words news corpus. The main purpose of the statistical supplements is to resolve ambiguity between alternative segmentations. The stemmer is evaluated on Amharic text from two domains, news articles and a classic fiction text. It is shown to have an accuracy of 60% for the old fashioned fiction text and 75% for the news articles.