A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Rough set methods and applications: new developments in knowledge discovery in information systems
Rough set methods and applications: new developments in knowledge discovery in information systems
Use of Contextual Information for Feature Ranking and Discretization
IEEE Transactions on Knowledge and Data Engineering
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Theoretical and Empirical Analysis of ReliefF and RReliefF
Machine Learning
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
IEEE Transactions on Pattern Analysis and Machine Intelligence
OCFS: optimal orthogonal centroid feature selection for text categorization
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Categorical Term Descriptor: A Proposed Term Weighting Scheme for Feature Selection
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Efficient huge-scale feature selection with speciated genetic algorithm
Pattern Recognition Letters
FS_SFS: A novel feature selection method for support vector machines
Pattern Recognition
Spectral feature selection for supervised and unsupervised learning
Proceedings of the 24th international conference on Machine learning
Feature selection methods for text classification
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
An Aggressive Feature Selection Method based on Rough Set Theory
ICICIC '07 Proceedings of the Second International Conference on Innovative Computing, Informatio and Control
Feature selection algorithms in classification problems: an experimental evaluation
AIKED'05 Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases
Stable feature selection via dense feature groups
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Fusion of feature selection methods for pairwise scoring SVM
Neurocomputing
Feature selection strategies for poorly correlated data: correlation coefficient considered harmful
AIKED'08 Proceedings of the 7th WSEAS International Conference on Artificial intelligence, knowledge engineering and data bases
Feature Selection Based on Genetic Algorithm for CBIR
CISP '08 Proceedings of the 2008 Congress on Image and Signal Processing, Vol. 2 - Volume 02
Feature selection method using preferences aggregation
MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Sentiment classification with supervised sequence embedding
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Hi-index | 0.00 |
Many feature selection methods have been proposed for text categorization. However, their performances are usually verified by experiments, so the results rely on the corpora used and may not be accurate. This paper proposes a novel feature selection framework called Distribution-Based Feature Selection (DBFS) based on distribution difference of features. This framework generalizes most of the state-of-the-art feature selection methods including OCFS, MI, ECE, IG, CHI and OR. The performances of many feature selection methods can be estimated by theoretical analysis using components of this framework. Besides, DBFS sheds light on the merits and drawbacks of many existing feature selection methods. In addition, this framework helps to select suitable feature selection methods for specific domains. Moreover, a weighted model based on DBFS is given so that suitable feature selection methods for unbalanced datasets can be derived. The experimental results show that they are more effective than CHI, IG and OCFS on both balanced and unbalanced datasets.