Performance of KNN and SVM classifiers on full word Arabic articles

Authors:
Ismail Hmeidi;Bilal Hawashin;Eyas El-Qawasmeh
Affiliations:
Faculty of Computer and Information Technology, Jórdan University of Science and Technology, Irbid 22110, Jórdan;Faculty of Computer and Information Technology, Jórdan University of Science and Technology, Irbid 22110, Jórdan;Faculty of Computer and Information Technology, Jórdan University of Science and Technology, Irbid 22110, Jórdan
Venue:
Advanced Engineering Informatics
Year:
2008

Citing 6
Cited 3

An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Using a generalized instance set for automatic text categorization

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

Recognition of Western style musical genres using machine learning techniques

Expert Systems with Applications: An International Journal
Using SVM based method for equipment fault detection in a thermal power plant

Computers in Industry
Feature sub-set selection metrics for Arabic text classification

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports a comparative study of two machine learning methods on Arabic text categorization. Based on a collection of news articles as a training set, and another set of news articles as a testing set, we evaluated K nearest neighbor (KNN) algorithm, and support vector machines (SVM) algorithm. We used the full word features and considered the tf.idf as the weighting method for feature selection, and CHI statistics as a ranking metric. Experiments showed that both methods were of superior performance on the test corpus while SVM showed a better micro average F1 and prediction time.