Improving Arabic information retrieval system using N-gram method

  • Authors:
  • Rammal Mahmoud;Sanan Majed;Zreik Khaldoun

  • Affiliations:
  • Legal Informatics Center, Lebanese University, Lebanon;Faculty of Science, Lebanese University, Lebanon;Department Hypermedia, Paris 8 University, France

  • Venue:
  • WSEAS Transactions on Computers
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the application of the indexing method and the Retrieval systems based on N-grams to the Arabic legal language used in official Lebanese government journal documents. In our work we have used N-gram as a representation method, based on words and characters, and then compared the results using the vector space model with three similarity measures: the TF*IDF weighting, Dice's coefficient and the Cosine Coefficient. The experiments demonstrate the use of trigrams to index Arabic documents is the optimal choice for Arabic information retrieval using N-grams. But using N-grams to indexing and retrieval legal Arabic documents is still insufficient in order to obtain good results and it is indispensable to adopt a linguistic approach that uses a legal thesaurus or ontology for juridical language.