Stemming the Qur'an

Authors:
Naglaa Thabet
Affiliations:
University of Newcastle, Newcastle upon Tyne, UK
Venue:
Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Year:
2004

Citing 3
Cited 4

Empirical studies in strategies for Arabic retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
On arabic search: improving the retrieval effectiveness via a light stemming approach

Proceedings of the eleventh international conference on Information and knowledge management

Understanding the thematic structure of the Qur'an: an exploratory multivariate approach

ACLstudent '05 Proceedings of the ACL Student Research Workshop
Sura Length and Lexical Probability Estimation in Cluster Analysis of the Qur’an

ACM Transactions on Asian Language Information Processing (TALIP)
Assas-Band, an affix-exception-list based Urdu stemmer

ALR7 Proceedings of the 7th Workshop on Asian Language Resources
The Effect of Stemming on Arabic Text Classification: An Empirical Study

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In natural language, a stem is the morphological base of a word to which affixes can be attached to form derivatives. Stemming is a process of assigning morphological variants of words to equivalence classes such that each class corresponds to a single stem. Different stemmers have been developed for a wide range of languages and for a variety of purposes. Arabic, a highly inflected language with complex orthography, requires good stemming for effective text analysis. Preliminary investigation indicates that existing approaches to Arabic stemming fail to provide effective and accurate equivalence classes when applied to a text like the Qur'an written in Classical Arabic. Therefore, I propose a new stemming approach based on a light stemming technique that uses a transliterated version of the Qur'an in western script.