Machine learning for Arabic text categorization: Research Articles

Authors:
Rehab M. Duwairi
Affiliations:
Department of Computer Information Systems, Jordan University of Science and Technology, P.O. Box 3030, Irbid, Jordan
Venue:
Journal of the American Society for Information Science and Technology
Year:
2006

Citing 15
Cited 4

An Arabic morphological system

IBM Systems Journal
Evaluating and optimizing autonomous text classification systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Hierarchical neural networks for text categorization (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Scalable association-based text classification

Proceedings of the ninth international conference on Information and knowledge management
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Exploiting hierarchical domain structure to compute similarity

ACM Transactions on Information Systems (TOIS)
On Machine Learning Methods for Chinese Document Categorization

Applied Intelligence
Automatic Textual Document Categorization Based on Generalized Instance Sets and a Metamodel

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient multi-way text categorization via generalized discriminant analysis

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Augmenting Naive Bayes Classifiers with Statistical Language Models

Information Retrieval
An optimized approach for KNN text categorization using P-trees

Proceedings of the 2004 ACM symposium on Applied computing
Improving linear classifier for Chinese text categorization

Information Processing and Management: an International Journal
A computational morphology system for Arabic

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Automatic Arabic document categorization based on the Naïve Bayes algorithm

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages

Support vector machines based Arabic language text classification system: feature selection comparative study

MATH'07 Proceedings of the 12th WSEAS International Conference on Applied Mathematics
Feature reduction techniques for Arabic text categorization

Journal of the American Society for Information Science and Technology
Automatically classifying documents by ideological and organizational affiliation

ISI'09 Proceedings of the 2009 IEEE international conference on Intelligence and security informatics
A comparison study of some Arabic root finding algorithms

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article we propose a distance-based classifier for categorizing Arabic text. Each category is represented as a vector of words in an m-dimensional space, and documents are classified on the basis of their closeness to feature vectors of categories. The classifier, in its learning phase, scans the set of training documents to extract features of categories that capture inherent category-specific properties; in its testing phase the classifier uses previously determined category-specific features to categorize unclassified documents. Stemming was used to reduce the dimensionality of feature vectors of documents. The accuracy of the classifier was tested by carrying out several categorization tasks on an in-house collected Arabic corpus. The results show that the proposed classifier is very accurate and robust. © 2006 Wiley Periodicals, Inc.