Margin-maximizing feature elimination methods for linear and nonlinear kernel-based discriminant functions

Authors:
Yaman Aksu;David J. Miller;George Kesidis;Qing X. Yang
Affiliations:
Electrical Engineering Department, Pennsylvania State University, University Park, PA;Electrical Engineering Department, Pennsylvania State University, University Park, PA;Electrical Engineering and Computer Science and Engineering Departments, Pennsylvania State University, University Park, PA;Department of Radiology, Pennsylvania State College of Medicine, Hershey, PA
Venue:
IEEE Transactions on Neural Networks
Year:
2010

Citing 15
Cited 2

Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
Use of the zero norm with linear models and kernel methods

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
A Feature Selection Newton Method for Support Vector Machine Classification

Computational Optimization and Applications
Feature selection using linear classifier weights: interaction with classification models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Multicategory Proximal Support Vector Machine Classifiers

Machine Learning
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs

The Journal of Machine Learning Research
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization

The Journal of Machine Learning Research
Direct convex relaxations of sparse SVM

Proceedings of the 24th international conference on Machine learning

Recipe for Uncovering Predictive Genes Using Support Vector Machines Based on Model Population Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Feature selection for medical diagnosis: Evaluation for cardiovascular diseases

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Feature selection for classification in high-dimensional spaces can improve generalization, reduce classifier complexity, and identify important, discriminating feature "markers." For support vector machine (SVM) classification, a widely used technique is recursive feature elimination (RFE). We demonstrate that RFE is not consistent with margin maximization, central to the SVM learning approach. We thus propose explicit margin-based feature elimination (MFE) for SVMs and demonstrate both improved margin and improved generalization, compared with RFE. Moreover, for the case of a nonlinear kernel, we show that RFE assumes that the squared weight vector 2-norm is strictly decreasing as features are eliminated. We demonstrate this is not true for the Gaussian kernel and, consequently, RFE may give poor results in this case. MFE for nonlinear kernels gives better margin and generalization. We also present an extension which achieves further margin gains, by optimizing only two degrees of freedom--the hyperplane's intercept and its squared 2-norm--with the weight vector orientation fixed. We finally introduce an extension that allows margin slackness. We compare against several alternatives, including RFE and a linear programming method that embeds feature selection within the classifier design. On high-dimensional gene microarray data sets, University of California at Irvine (UCI) repository data sets, and Alzheimer's disease brain image data, MFE methods give promising results.