Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to variable and feature selection
The Journal of Machine Learning Research
Use of the zero norm with linear models and kernel methods
The Journal of Machine Learning Research
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
A Feature Selection Newton Method for Support Vector Machine Classification
Computational Optimization and Applications
Feature selection using linear classifier weights: interaction with classification models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Multicategory Proximal Support Vector Machine Classifiers
Machine Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs
The Journal of Machine Learning Research
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization
The Journal of Machine Learning Research
Direct convex relaxations of sparse SVM
Proceedings of the 24th international conference on Machine learning
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Feature selection for medical diagnosis: Evaluation for cardiovascular diseases
Expert Systems with Applications: An International Journal
Hi-index | 0.01 |
Feature selection for classification in high-dimensional spaces can improve generalization, reduce classifier complexity, and identify important, discriminating feature "markers." For support vector machine (SVM) classification, a widely used technique is recursive feature elimination (RFE). We demonstrate that RFE is not consistent with margin maximization, central to the SVM learning approach. We thus propose explicit margin-based feature elimination (MFE) for SVMs and demonstrate both improved margin and improved generalization, compared with RFE. Moreover, for the case of a nonlinear kernel, we show that RFE assumes that the squared weight vector 2-norm is strictly decreasing as features are eliminated. We demonstrate this is not true for the Gaussian kernel and, consequently, RFE may give poor results in this case. MFE for nonlinear kernels gives better margin and generalization. We also present an extension which achieves further margin gains, by optimizing only two degrees of freedom--the hyperplane's intercept and its squared 2-norm--with the weight vector orientation fixed. We finally introduce an extension that allows margin slackness. We compare against several alternatives, including RFE and a linear programming method that embeds feature selection within the classifier design. On high-dimensional gene microarray data sets, University of California at Irvine (UCI) repository data sets, and Alzheimer's disease brain image data, MFE methods give promising results.