Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient Feature Selection via Analysis of Relevance and Redundancy
The Journal of Machine Learning Research
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Intelligent Systems
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Feature Selection for Gene Expression Using Model-Based Entropy
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
As a high dimensional problem, it is a hard task to analyze the text data sets, where many weakly relevant but redundant features hurt generalization performance of classifiers. There are previous works to handle this problem by using pair-wise feature similarities, which do not consider discriminative contribution of each feature by utilizing the label information. Here we define an Approximate Markov Blanket (AMB) based on the metric of DIScriminative Contribution (DISC) to eliminate redundant features and propose the AMB-DISC algorithm. Experimental results on the data set of Reuter-21578 show AMBDISC is much better than the previous state-of-arts feature selection algorithms considering feature redundancy in terms of MicroavgF1 and MacroavgF1.