On arabic search: improving the retrieval effectiveness via a light stemming approach

  • Authors:
  • Mohammed Aljlayl;Ophir Frieder

  • Affiliations:
  • Riyadh College of Technology, Riyadh, Saudi Arabia;Illinois Institute of Technology, Chicago, IL

  • Venue:
  • Proceedings of the eleventh international conference on Information and knowledge management
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The inflectional structure of a word impacts the retrieval accuracy of information retrieval systems of Latin-based languages. We present two stemming algorithms for Arabic information retrieval systems. We empirically investigate the effectiveness of surface-based retrieval. This approach degrades retrieval precision since Arabic is a highly inflected language. Accordingly, we propose root-based retrieval. We notice a statistically significant improvement over the surface-based approach. Many variant word senses are based on an identical root; thus, the root-based algorithm creates invalid conflation classes that result in an ambiguous query which degrades the performance by adding extraneous terms. To resolve ambiguity, we propose a novel light-stemming algorithm for Arabic texts. This automatic rule-based stemming algorithm is not as aggressive as the root extraction algorithm. We show that the light stemming algorithm significantly outperforms the root-based algorithm. We also show that a significant improvement in retrieval precision can be achieved with light inflectional analysis of Arabic words.