A comparative study for Arabic text classification algorithms based on stop words elimination

Authors:
Bassam Al-Shargabi;Waseem Al-Romimah;Fekry Olayah
Affiliations:
Al-Isra University, Amman-Jordan;University of Science and Technology, Sana'a-Yemen;Al-Isra University, Amman-Jordan
Venue:
Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications
Year:
2011

Citing 5
Cited 1

Machine Learning

Machine Learning
High-performing feature selection for text classification

Proceedings of the eleventh international conference on Information and knowledge management
Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Automatic Arabic document categorization based on the Naïve Bayes algorithm

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages

The Effect of Stemming on Arabic Text Classification: An Empirical Study

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper compares three techniques for Arabic text classification; these techniques are Support Vector Machine (SVM) with Sequential Minimal Optimization (SMO), Naïve Bayesian (NB), and J48. The main objective of this paper is to measure the accuracy for each classifier and to determine which classifier is more accurate for Arabic text classification based on stop words elimination. The accuracy for classifier is measured by Percentage split method (holdout), and K-fold cross validation methods,. The results show that the SMO classifier achieves the highest accuracy and the lowest error rate, and shows that the time needed to build the SMO model is the smallest time.