The Effect of Stemming on Arabic Text Classification: An Empirical Study

Authors:
Izzat Alsmadi;Mohammed Al-Kabi;Abdullah Wahbeh;Qasem Al-Radaideh;Emad Al-Shawakfa
Affiliations:
Yarmouk University, Jordan;Yarmouk University, Jordan;Dakota State University, USA;Yarmouk University, Jordan;Yarmouk University, Jordan
Venue:
International Journal of Information Retrieval Research
Year:
2011

Citing 29
Cited 1

Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Fast Text Classification: A Training-Corpus Pruning Based Approach

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Arabic Stemming Without A Root Dictionary

ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01
Efficient Text Classification by Weighted Proximal SVM

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Some Effective Techniques for Naive Bayes Text Classification

IEEE Transactions on Knowledge and Data Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A novel feature selection algorithm for text categorization

Expert Systems with Applications: An International Journal
Text classification: A least square support vector machine approach

Applied Soft Computing
A Concept Similarity Based Text Classification Algorithm

FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 01
Top 10 algorithms in data mining

Knowledge and Information Systems
Support vector machines based Arabic language text classification system: feature selection comparative study

MATH'07 Proceedings of the 12th WSEAS International Conference on Applied Mathematics
An Extensive Empirical Study of Feature Selection for Text Categorization

ICIS '08 Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)
A Comparative Study of Selected Classification Accuracy in User Profiling

ICMLA '08 Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications
A comparison of text-classification techniques applied to Arabic text

Journal of the American Society for Information Science and Technology
Capturing out-of-vocabulary words in Arabic text

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
MMR-based feature selection for text categorization

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Automatic Arabic document categorization based on the Naïve Bayes algorithm

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Stemming the Qur'an

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Using some web content mining techniques for Arabic text classification

DNCOCO'09 Proceedings of the 8th WSEAS international conference on Data networks, communications, computers
Enhanced Algorithm for Extracting the Root of Arabic Words

CGIV '09 Proceedings of the 2009 Sixth International Conference on Computer Graphics, Imaging and Visualization
Sentence-level event classification in unstructured texts

Information Retrieval
A comparison study of some Arabic root finding algorithms

Journal of the American Society for Information Science and Technology
Benchmarking and assessing the performance of Arabic stemmers

Journal of Information Science
A comparative study for Arabic text classification algorithms based on stop words elimination

Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications
Entropy based feature selection for text categorization

Proceedings of the 2011 ACM Symposium on Applied Computing
A rule-based Arabic stemming algorithm

ECC'11 Proceedings of the 5th European conference on European computing conference
A general and multi-lingual phrase chunking model based on masking method

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Applying authorship analysis to arabic web content

ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Multi-lingual detection of terrorist content on the web

WISI'06 Proceedings of the 2006 international conference on Intelligence and Security Informatics

Effects of Terms Recognition Mistakes on Requests Processing for Interactive Information Retrieval

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The information world is rich of documents in different formats or applications, such as databases, digital libraries, and the Web. Text classification is used for aiding search functionality offered by search engines and information retrieval systems to deal with the large number of documents on the web. Many research papers, conducted within the field of text classification, were applied to English, Dutch, Chinese, and other languages, whereas fewer were applied to Arabic language. This paper addresses the issue of automatic classification or classification of Arabic text documents. It applies text classification to Arabic language text documents using stemming as part of the preprocessing steps. Results have showed that applying text classification without using stemming; the support vector machine SVM classifier has achieved the highest classification accuracy using the two test modes with 87.79% and 88.54%. On the other hand, stemming has negatively affected the accuracy, where the SVM accuracy using the two test modes dropped down to 84.49% and 86.35%.