An auto-indexing method for Arabic text

Authors:
Nashat Mansour;Ramzi A. Haraty;Walid Daher;Manal Houri
Affiliations:
Division of Computer Science and Mathematics, Lebanese American University, P.O. Box 13-5053, Chouran, Beirut 1102 3801, Lebanon;Division of Computer Science and Mathematics, Lebanese American University, P.O. Box 13-5053, Chouran, Beirut 1102 3801, Lebanon;Division of Computer Science and Mathematics, Lebanese American University, P.O. Box 13-5053, Chouran, Beirut 1102 3801, Lebanon;Division of Computer Science and Mathematics, Lebanese American University, P.O. Box 13-5053, Chouran, Beirut 1102 3801, Lebanon
Venue:
Information Processing and Management: an International Journal
Year:
2008

Citing 2
Cited 1

Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Online Information Retrieval: Concepts, Principles, and Techniques

Online Information Retrieval: Concepts, Principles, and Techniques

Current Approaches in Arabic IR: A Survey

ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work addresses the information retrieval problem of auto-indexing Arabic documents. Auto-indexing a text document refers to automatically extracting words that are suitable for building an index for the document. In this paper, we propose an auto-indexing method for Arabic text documents. This method is mainly based on morphological analysis and on a technique for assigning weights to words. The morphological analysis uses a number of grammatical rules to extract stem words that become candidate index words. The weight assignment technique computes weights for these words relative to the container document. The weight is based on how spread is the word in a document and not only on its rate of occurrence. The candidate index words are then sorted in descending order by weight so that information retrievers can select the more important index words. We empirically verify the usefulness of our method using several examples. For these examples, we obtained an average recall of 46% and an average precision of 64%.