Trading MIPS and memory for knowledge engineering
Communications of the ACM
Automatic indexing based on Bayesian inference networks
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
Modern Information Retrieval
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
High-performing feature selection for text classification
Proceedings of the eleventh international conference on Information and knowledge management
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A family of additive online algorithms for category ranking
The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
A computational morphology system for Arabic
Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Machine learning for Arabic text categorization: Research Articles
Journal of the American Society for Information Science and Technology
MATH'07 Proceedings of the 12th WSEAS International Conference on Applied Mathematics
A novel Arabic lemmatization algorithm
Proceedings of the second workshop on Analytics for noisy unstructured text data
Feature reduction techniques for Arabic text categorization
Journal of the American Society for Information Science and Technology
Using some web content mining techniques for Arabic text classification
DNCOCO'09 Proceedings of the 8th WSEAS international conference on Data networks, communications, computers
Automatically classifying documents by ideological and organizational affiliation
ISI'09 Proceedings of the 2009 IEEE international conference on Intelligence and security informatics
Estimating the size and evolution of categorised topics in web directories
Web Intelligence and Agent Systems
A comparative study for Arabic text classification algorithms based on stop words elimination
Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications
Feature sub-set selection metrics for Arabic text classification
Pattern Recognition Letters
An empirical study on the feature's type effect on the automatic classification of arabic documents
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Effect of ISRI stemming on similarity measure for arabic document clustering
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Topic detection and multi-word terms extraction for arabic unvowelized documents
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
The Effect of Stemming on Arabic Text Classification: An Empirical Study
International Journal of Information Retrieval Research
International Journal of Information Technology and Web Engineering
Hi-index | 0.00 |
This paper deals with automatic classification of Arabic web documents. Such a classification is very useful for affording directory search functionality, which has been used by many web portals and search engines to cope with an ever-increasing number of documents on the web. In this paper, Naive Bayes (NB) which is a statistical machine learning algorithm, is used to classify non-vocalized Arabic web documents (after their words have been transformed to the corresponding canonical form, i.e., roots) to one of five pre-defined categories. Cross validation experiments are used to evaluate the NB categorizer. The data set used during these experiments consists of 300 web documents per category. The results of cross validation in the leave-one-out experiment show that, using 2,000 terms/roots, the categorization accuracy varies from one category to another with an average accuracy over all categories of 68.78 %. Furthermore, the best categorization performance by category during cross validation experiments goes up to 92.8%. Further tests carried out on a manually collected evaluation set which consists of 10 documents from each of the 5 categories, show that the overall classification accuracy achieved over all categories is 62%, and that the best result by category reaches 90%.