An algorithm for suffix stripping
Readings in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization
ECDL '00 Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries
An introduction to variable and feature selection
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Variable selection using svm based criteria
The Journal of Machine Learning Research
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A pitfall and solution in multi-class feature selection for text classification
ICML '04 Proceedings of the twenty-first international conference on Machine learning
An Empirical Study of Feature Selection for Text Categorization based on Term Weightage
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Efficient Feature Selection via Analysis of Relevance and Redundancy
The Journal of Machine Learning Research
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Higher order feature selection for text classification
Knowledge and Information Systems
Combining feature selectors for text classification
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Multi-class feature selection for texture classification
Pattern Recognition Letters
Feature selection methods for text classification
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Supervised and Traditional Term Weighting Methods for Automatic Text Categorization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Beyond TFIDF weighting for text categorization in the vector space model
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A framework of feature selection methods for text categorization
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Classification of skewed and homogenous document corpora with class-based and corpus-based keywords
KI'06 Proceedings of the 29th annual German conference on Artificial intelligence
Discriminative semi-supervised feature selection via manifold regularization
IEEE Transactions on Neural Networks
Comparison of metrics for feature selection in imbalanced text classification
Expert Systems with Applications: An International Journal
Combination of feature selection methods for text categorisation
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Text categorization with class-based and corpus-based keyword selection
ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Comparison of texture features based on Gabor filters
IEEE Transactions on Image Processing
A global-ranking local feature selection method for text categorization
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
Text categorization is the task of automatically assigning unlabeled text documents to some predefined category labels by means of an induction algorithm. Since the data in text categorization are high-dimensional, often feature selection is used for reducing the dimensionality. In this paper, we make an evaluation and comparison of the feature selection policies used in text categorization by employing some of the popular feature selection metrics. For the experiments, we use datasets which vary in size, complexity, and skewness. We use support vector machine as the classifier and tf-idf weighting for weighting the terms. In addition to the evaluation of the policies, we propose new feature selection metrics which show high success rates especially with low number of keywords. These metrics are two-sided local metrics and are based on the difference of the distributions of a term in the documents belonging to a class and in the documents not belonging to that class. Moreover, we propose a keyword selection framework called adaptive keyword selection. It is based on selecting different number of terms for each class and it shows significant improvement on skewed datasets that have a limited number of training instances for some of the classes.