Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
An adaptive k-nearest neighbor text categorization strategy
ACM Transactions on Asian Language Information Processing (TALIP)
Bias Analysis in Text Classification for Highly Skewed Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A novel feature selection algorithm for text categorization
Expert Systems with Applications: An International Journal
Imbalanced text classification: A term weighting approach
Expert Systems with Applications: An International Journal
Feature selection with a measure of deviations from Poisson in text categorization
Expert Systems with Applications: An International Journal
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Distinctive characteristics of a metric using deviations from Poisson for feature selection
Expert Systems with Applications: An International Journal
Incorporating game theory in feature selection for text categorization
RSFDGrC'11 Proceedings of the 13th international conference on Rough sets, fuzzy sets, data mining and granular computing
Feature evaluation and selection with cooperative game theory
Pattern Recognition
Class-indexing-based term weighting for automatic text classification
Information Sciences: an International Journal
Comparison of text feature selection policies and using an adaptive framework
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
Abstract: Class imbalance problems are often encountered in real applications of automatic text classifications especially at the so-called ''one-against-all'' settings and thus handling the problem with satisfactory performance is substantially important. In this paper, we focus our attention on a feature selection scheme for solving this problem and explore the abilities and characteristics of various metrics for feature selection. We examine three different types of metrics; Type-I: @g"P^2 and Gini index, Type-II: @g^2 and information gain and Type-III: signed @g^2 and signed information gain. Type-I and Type-II metrics implicitly combine positive and negative features which indicate the membership and nonmembership of positive class, respectively. Type-III metrics were utilized in the combination framework in which the positive and negative features are explicitly combined and the degree of combination is optimized to improve the performance at imbalanced situations. Our experimental results show that feature selections using Type-I metrics on imbalanced data set achieve the comparable classification performances with those of the combination framework using Type-III metrics and proved to be much more superior to those of Type-II metrics. This result indicates that Type-I metrics serve as more simplified alternative methods for the combination framework. The characteristic behaviors and the performance of each of the used metrics are also investigated closely in terms of the distribution and quality of selected features.