Improving Arabic information retrieval system using N-gram method

Authors:
Rammal Mahmoud;Sanan Majed;Zreik Khaldoun
Affiliations:
Legal Informatics Center, Lebanese University, Lebanon;Faculty of Science, Lebanese University, Lebanon;Department Hypermedia, Paris 8 University, France
Venue:
WSEAS Transactions on Computers
Year:
2011

Citing 9
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Stemming methodologies over individual query words for an Arabic information retrieval system

Journal of the American Society for Information Science
A vector space model for automatic indexing

Communications of the ACM
Empirical studies in strategies for Arabic retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Models in information retrieval

Lectures on information retrieval
Using N-grams for Arabic text searching

Journal of the American Society for Information Science and Technology
Character contiguity in N-gram-based word matching: the case for Arabic text searching

Information Processing and Management: an International Journal
A computational morphology system for Arabic

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the application of the indexing method and the Retrieval systems based on N-grams to the Arabic legal language used in official Lebanese government journal documents. In our work we have used N-gram as a representation method, based on words and characters, and then compared the results using the vector space model with three similarity measures: the TF*IDF weighting, Dice's coefficient and the Cosine Coefficient. The experiments demonstrate the use of trigrams to index Arabic documents is the optimal choice for Arabic information retrieval using N-grams. But using N-grams to indexing and retrieval legal Arabic documents is still insufficient in order to obtain good results and it is indispensable to adopt a linguistic approach that uses a legal thesaurus or ontology for juridical language.