Arabic retrieval revisited: morphological hole filling

  • Authors:
  • Kareem Darwish;Ahmed M. Ali

  • Affiliations:
  • Qatar Computing Research Institute, Doha, Qatar;Qatar Computing Research Institute, Doha, Qatar

  • Venue:
  • ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to Arabic's morphological complexity, Arabic retrieval benefits greatly from morphological analysis -- particularly stemming. However, the best known stemming does not handle linguistic phenomena such as broken plurals and malformed stems. In this paper we propose a model of character-level morphological transformation that is trained using Wikipedia hypertext to page title links. The use of our model yields statistically significant improvements in Arabic retrieval over the use of the best statistical stemming technique. The technique can potentially be applied to other languages.