Towards an error-free Arabic stemming

  • Authors:
  • Eiman Tamah Al-Shammari;Jessica Lin

  • Affiliations:
  • George Mason University / Kuwait University, Fairfax, VA, USA;George Mason University, Fairfax, USA

  • Venue:
  • Proceedings of the 2nd ACM workshop on Improving non english web searching
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Stemming is a computational process for reducing words to their roots (or stems). It can be classified as a recall-enhancing or precision-enhancing component. Existing Arabic stemmers suffer from high stemming error-rates. Arabic stemmers blindly stem all the words and perform poorly especially with compound words, nouns and foreign Arabized words. The Educated Text Stemmer (ETS) is presented in this paper. ETS is a dictionary free, simple, and highly effective Arabic stemming algorithm that can reduce stemming errors in addition to decreasing computational time and data storage. The novelty of the work arises from the use of neglected Arabic stop-words. These stop-words can be highly important and can provide a significant improvement to processing Arabic documents. The ETS stemmer is evaluated by comparison with output from human generated stemming and the stemming weight technique.