Benchmarking and assessing the performance of Arabic stemmers

Authors:
Mohammed N. Al-Kabi;Qasem A. Al-Radaideh;Khalid W. Akkawi
Affiliations:
Department of Computer Information Systems, Facultyof Information Technology, Yarmouk University, Irbid, Jordan;Department of Computer Information Systems, Facultyof Information Technology, Yarmouk University, Irbid, Jordan;eBECS Ltd, Amman, Jordan
Venue:
Journal of Information Science
Year:
2011

Citing 10
Cited 1

Stemming and its effects on TFIDF ranking (poster session)

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Strength and similarity of affix removal stemming algorithms

ACM SIGIR Forum
Arabic Stemming Without A Root Dictionary

ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01
A computational morphology system for Arabic

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Enhanced Algorithm for Extracting the Root of Arabic Words

CGIV '09 Proceedings of the 2009 Sixth International Conference on Computer Graphics, Imaging and Visualization
A novel approach to the extraction of roots from Arabic words using bigrams

Journal of the American Society for Information Science and Technology
A comparison study of some Arabic root finding algorithms

Journal of the American Society for Information Science and Technology
Stemming arabic conjunctions and prepositions

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

The Effect of Stemming on Arabic Text Classification: An Empirical Study

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.01

Visualization

Abstract

Previous studies on the stemming of the Arabic language lack fair evaluation, full description of algorithms used or access to the source code of the stemmers and the datasets used to evaluate such stemmers. Freeing source codes and datasets is an essential step to enable researchers to enhance stemmers currently in use and to verify the results of these studies. This study laid the foundation of establishing a benchmark for Arabic stemmers and presents an evaluation of four heavy (root-based) stemmers for the Arabic language. The evaluation aims to assess the accuracy of each of the four stemmers and to show the strength of each. The four algorithms are: Al-Mustafa stemmer, Al-Sarhan stemmer, Rabab芒聙聶ah stemmer and Taghva stemmer. The accuracy and strength tests used in this study ranked Rabab芒聙聶ah stemmer as the first followed by Al-Sarhan, Al-Mustafa, and Taghva stemmers respectively.