Stemming methodologies over individual query words for an Arabic information retrieval system
Journal of the American Society for Information Science
Modern Information Retrieval
Arabic morphological analysis techniques: a comprehensive survey
Journal of the American Society for Information Science and Technology
Arabic Stemming Without A Root Dictionary
ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01
Acquisition system for Arabic noun morphology
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Building a shallow Arabic Morphological Analyzer in one day
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
A computational morphology system for Arabic
Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Benchmarking and assessing the performance of Arabic stemmers
Journal of Information Science
Hi-index | 0.00 |
Root extraction is one of the most important topics in information retrieval (IR), natural language processing (NLP), text summarization, and many other important fields. In the last two decades, several algorithms have been proposed to extract Arabic roots. Most of these algorithms dealt with triliteral roots only, and some with fixed length words only. In this study, a novel approach to the extraction of roots from Arabic words using bigrams is proposed. Two similarity measures are used, the dissimilarity measure called the “Manhattan distance,” and Dice's measure of similarity. The proposed algorithm is tested on the Holy Qu'ran and on a corpus of 242 abstracts from the Proceedings of the Saudi Arabian National Computer Conferences. The two files used contain a wide range of data: the Holy Qu'ran contains most of the ancient Arabic words while the other file contains some modern Arabic words and some words borrowed from foreign languages in addition to the original Arabic words. The results of this study showed that combining N-grams with the Dice measure gives better results than using the Manhattan distance measure. © 2010 Wiley Periodicals, Inc.