Enhanced Algorithm for Extracting the Root of Arabic Words

Authors:
Sameh Ghwanmeh;Ghassan Kanaan;Riyad Al-Shalabi;Saif Rabab'ah
Affiliations:
-;-;-;-
Venue:
CGIV '09 Proceedings of the 2009 Sixth International Conference on Computer Graphics, Imaging and Visualization
Year:
2009

Citing 0
Cited 3

Benchmarking and assessing the performance of Arabic stemmers

Journal of Information Science
The Effect of Stemming on Arabic Text Classification: An Empirical Study

International Journal of Information Retrieval Research
Comparing Different Sparse Matrix Storage Structures as Index Structure for Arabic Text Collection

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.01

Visualization

Abstract

Stemming is one of many tools used in information retrieval to combat the vocabulary mismatch problem, in which query words do not match document words. Stemming in the Arabic language does not fit into the usual mold, because stemming in most research in other languages so far depends only on eliminating prefixes and suffixes from the word, but Arabic words contain infixes as well. In this paper we have introduced an enhanced root-based algorithm that handles the problems of affixes, including prefixes, suffixes, and infixes depending on the morphological pattern of the word. The stemming concept has been used to eliminate all kinds of affixes, including infixes. Series of simulation experiments have been conducted to test the performance of the proposed algorithm. The results obtained showed that the algorithm extracts the correct roots with an accuracy rate up to 95%