Comparison of text feature selection policies and using an adaptive framework

Authors:
ŞErafettin TaşCı;Tunga GüNgöR
Affiliations:
Boğaziçi University, Computer Engineering Department, Bebek, 34342 İstanbul, Turkey;Boğaziçi University, Computer Engineering Department, Bebek, 34342 İstanbul, Turkey
Venue:
Expert Systems with Applications: An International Journal
Year:
2013

Citing 31
Cited 0

An algorithm for suffix stripping

Readings in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization

ECDL '00 Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries
An introduction to variable and feature selection

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Variable selection using svm based criteria

The Journal of Machine Learning Research
Supervised term weighting for automated text categorization

Proceedings of the 2003 ACM symposium on Applied computing
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Feature selection for text categorization on imbalanced data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A pitfall and solution in multi-class feature selection for text classification

ICML '04 Proceedings of the twenty-first international conference on Machine learning
An Empirical Study of Feature Selection for Text Categorization based on Term Weightage

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Higher order feature selection for text classification

Knowledge and Information Systems
Combining feature selectors for text classification

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Multi-class feature selection for texture classification

Pattern Recognition Letters
Feature selection methods for text classification

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Supervised and Traditional Term Weighting Methods for Automatic Text Categorization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Beyond TFIDF weighting for text categorization in the vector space model

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A framework of feature selection methods for text categorization

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Classification of skewed and homogenous document corpora with class-based and corpus-based keywords

KI'06 Proceedings of the 29th annual German conference on Artificial intelligence
Discriminative semi-supervised feature selection via manifold regularization

IEEE Transactions on Neural Networks
Comparison of metrics for feature selection in imbalanced text classification

Expert Systems with Applications: An International Journal
Combination of feature selection methods for text categorisation

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Text categorization with class-based and corpus-based keyword selection

ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Comparison of texture features based on Gabor filters

IEEE Transactions on Image Processing
A global-ranking local feature selection method for text categorization

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Text categorization is the task of automatically assigning unlabeled text documents to some predefined category labels by means of an induction algorithm. Since the data in text categorization are high-dimensional, often feature selection is used for reducing the dimensionality. In this paper, we make an evaluation and comparison of the feature selection policies used in text categorization by employing some of the popular feature selection metrics. For the experiments, we use datasets which vary in size, complexity, and skewness. We use support vector machine as the classifier and tf-idf weighting for weighting the terms. In addition to the evaluation of the policies, we propose new feature selection metrics which show high success rates especially with low number of keywords. These metrics are two-sided local metrics and are based on the difference of the distributions of a term in the documents belonging to a class and in the documents not belonging to that class. Moreover, we propose a keyword selection framework called adaptive keyword selection. It is based on selecting different number of terms for each class and it shows significant improvement on skewed datasets that have a limited number of training instances for some of the classes.