Feature sub-set selection metrics for Arabic text classification

Authors:
Abdelwadood Moh'd Mesleh
Affiliations:
Computer Engineering Department, Faculty of Engineering Technology, Al-Blaqa' Applied University, Amman, Jordan
Venue:
Pattern Recognition Letters
Year:
2011

Citing 28
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Communications of the ACM
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Modern Information Retrieval

Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization

ECDL '00 Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries
On Machine Learning Methods for Chinese Document Categorization

Applied Intelligence
A scalability analysis of classifiers in text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
A Feature Selection Framework for Text Filtering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Arabic morphology generation using a concatenative strategy

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Arabic finite-state morphological analysis and generation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Building a shallow Arabic Morphological Analyzer in one day

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
A novel feature selection algorithm for text categorization

Expert Systems with Applications: An International Journal
Acoustic model and pronunciation adaptation in automatic speech recognition

Acoustic model and pronunciation adaptation in automatic speech recognition
Performance of KNN and SVM classifiers on full word Arabic articles

Advanced Engineering Informatics
BNS feature scaling: an improved representation over tf-idf for svm text classification

Proceedings of the 17th ACM conference on Information and knowledge management
Feature selection with a measure of deviations from Poisson in text categorization

Expert Systems with Applications: An International Journal
Feature selection for text classification with Naïve Bayes

Expert Systems with Applications: An International Journal
A comparison of text-classification techniques applied to Arabic text

Journal of the American Society for Information Science and Technology
Automatic Arabic document categorization based on the Naïve Bayes algorithm

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Information Extraction: Algorithms and Prospects in a Retrieval Context

Information Extraction: Algorithms and Prospects in a Retrieval Context

Quantified Score

Hi-index	0.10

Visualization

Abstract

Feature sub-set selection (FSS) is an important step for effective text classification (TC) systems. This paper presents an empirical comparison of seventeen traditional FSS metrics for TC tasks. The TC is restricted to support vector machine (SVM) classifier and only for Arabic articles. Evaluation used a corpus that consists of 7842 documents independently classified into ten categories. The experimental results are presented in terms of macro-averaging precision, macro-averaging recall and macro-averaging F"1 measures. Results reveal that Chi-square and Fallout FSS metrics work best for Arabic TC tasks.