A comparison of text-classification techniques applied to Arabic text

  • Authors:
  • Ghassan Kanaan;Riyad Al-Shalabi;Sameh Ghwanmeh;Hamda Al-Ma'adeed

  • Affiliations:
  • Arab Academy for Banking and Financial Services, Amman, Jordan;Arab Academy for Banking and Financial Services, Amman, Jordan;Computer Engineering Department, Yarmouk University, Jordan;Arab Academy for Banking and Financial Services, Amman, Jordan

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many algorithms have been implemented for the problem of text classification. Most of the work in this area was carried out for English text. Very little research has been carried out on Arabic text. The nature of Arabic text is different than that of English text, and preprocessing of Arabic text is more challenging. This paper presents an implementation of three automatic text-classification techniques for Arabic text. A corpus of 1445 Arabic text documents belonging to nine categories has been automatically classified using the kNN, Rocchio, and naïve Bayes algorithms. The research results reveal that Naïve Bayes was the best performer, followed by kNN and Rocchio. © 2009 Wiley Periodicals, Inc.