A comparison study of some Arabic root finding algorithms

Authors:
Emad Al-Shawakfa;Amer Al-Badarneh;Safwan Shatnawi;Khaleel Al-Rabab'ah;Basel Bani-Ismail
Affiliations:
Computer Information Systems Department, Yarmouk University, Irbid 211-63, Jordan;Computer Information Systems Department, Jordan University for Science and Technology, Irbid, Jordan;Applied Studies College, University of Bahrain, Sakhir, Bahrain;Applied Studies College, University of Bahrain, Sakhir, Bahrain;Applied Studies College, University of Bahrain, Sakhir, Bahrain
Venue:
Journal of the American Society for Information Science and Technology
Year:
2010

Citing 16
Cited 3

Comparing words, stems, and roots as index terms in an Arabic Information Retrieval System

Journal of the American Society for Information Science
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
On arabic search: improving the retrieval effectiveness via a light stemming approach

Proceedings of the eleventh international conference on Information and knowledge management
Bilingual (arabic/english) dialogues with a network operating system using case frames

Bilingual (arabic/english) dialogues with a network operating system using case frames
Arabic morphological analysis techniques: a comprehensive survey

Journal of the American Society for Information Science and Technology
Arabic Stemming Without A Root Dictionary

ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01
Rule merging in a rule-based Arabic stemmer

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A morphologically sensitive clustering algorithm for identifying Arabic roots

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Building a shallow Arabic Morphological Analyzer in one day

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
QARAB: a question answering system to support the Arabic language

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
A comprehensive NLP system for Modern Standard Arabic and Modern Hebrew

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Machine learning for Arabic text categorization: Research Articles

Journal of the American Society for Information Science and Technology
Combination of Arabic preprocessing schemes for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A novel Arabic lemmatization algorithm

Proceedings of the second workshop on Analytics for noisy unstructured text data
Towards an error-free Arabic stemming

Proceedings of the 2nd ACM workshop on Improving non english web searching
A computational morphology system for Arabic

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages

Benchmarking and assessing the performance of Arabic stemmers

Journal of Information Science
The Effect of Stemming on Arabic Text Classification: An Empirical Study

International Journal of Information Retrieval Research
Comparing Different Sparse Matrix Storage Structures as Index Structure for Arabic Text Collection

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Arabic has a complex structure, which makes it difficult to apply natural language processing (NLP). Much research on Arabic NLP (ANLP) does exist; however, it is not as mature as that of other languages. Finding Arabic roots is an important step toward conducting effective research on most of ANLP applications. The authors have studied and compared six root-finding algorithms with success rates of over 90%. All algorithms of this study did not use the same testing corpus and-or benchmarking measures. They unified the testing process by implementing their own algorithm descriptions and building a corpus out of 3823 triliteral roots, applying 73 triliteral patterns, and with 18 affixes, producing around 27.6 million words. They tested the algorithms with the generated corpus and have obtained interesting results; they offer to share the corpus freely for benchmarking and ANLP research. © 2010 Wiley Periodicals, Inc.