Stemming arabic conjunctions and prepositions

  • Authors:
  • Abdusalam F. A. Nwesri;S. M. M. Tahaghoghi;Falk Scholer

  • Affiliations:
  • School of Computer Science and Information Technology, RMIT University, Melbourne, Australia;School of Computer Science and Information Technology, RMIT University, Melbourne, Australia;School of Computer Science and Information Technology, RMIT University, Melbourne, Australia

  • Venue:
  • SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Arabic is the fourth most widely spoken language in the world, and is characterised by a high rate of inflection. To cater for this, most Arabic information retrieval systems incorporate a stemming stage. Most existing Arabic stemmers are derived from English equivalents; however, unlike English, most affixes in Arabic are difficult to discriminate from the core word. Removing incorrectly identified affixes sometimes results in a valid but incorrect stem, and in most cases reduces retrieval precision. Conjunctions and prepositions form an interesting class of these affixes. In this work, we present novel approaches for dealing with these affixes. Unlike previous approaches, our approaches focus on retaining valid Arabic core words, while maintaining high retrieval performance.