Using N-grams for Arabic text searching

  • Authors:
  • Suleiman H. Mustafa;Qasem A. Al-Radaideh

  • Affiliations:
  • Department of Computer Information Systems, Yarmouk University, Irbid, Jordan;Department of Computer Information Systems, Yarmouk University, Irbid, Jordan

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

N-grams have been widely investigated for a number of text processing and retrieval applications. This article examines the performance of the digram and trigram term conflation techniques in the context of Arabic free text retrieval. It reports the results of using the N-gram approach for a corpus of thousands of distinct textual words drawn from a number of sources representing various disciplines. The results indicate that the digram method offers a better performance than trigram with respect to conflation precision and conflation recall ratios. In either case, the N-gram approach does not appear to provide an efficient conflation approach due to the peculiarities imposed by the Arabic infix structure that reduces the rate of correct N-gram matching.